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Introduction 


When we have to add, or multiply, even big numbers everything 
goes almost mechanically. This is a routine work, ..., the true math¬ 
ematical thinking begins when one has to solve a real problem, that is 
to say, to identify a mathematical structure that would match the 
conditions of the problem, to understand principles of its functioning, 
to grasp connections with other mathematical structures, and to 
deduce the consequences implied by the logic of the problem. Such 
manipulations of structures are always immersed into various calcu¬ 
lations since calculations form a natural language of mathematical 
structures. Michel Heller (2008) 

This present "compendium" is for those who like me are 
engaged in practical laboratory work and do not have a major 
in statistical analysis and feel somewhat uncomfortable with 
the statistical jargon. We frequently face the need to analyze 
large amounts of data of various origins, collected for various 
purposes in routine or research work, and have discovered the 
power of spreadsheet programs in calculations and general 
data analysis. 

Commercial statistical "packages" provide many of the 
analysis used in the laboratory. By necessity, the organization 
of the data in these packages has to accommodate many dif¬ 
ferent requirements and is perhaps not optimal for a particular 
practical purpose. Laboratorians often desire to visualize 
their results graphically and interactively. The availability of 
spreadsheet programs has eliminated much of problems and 
hassle with calculations in statistics, provided simple under¬ 
standable formulas are available. Indeed, simple spreadsheet 
programming can satisfy most of the necessary calculations 
and offer simple, efficient, and customized solutions. 

This present compendium is not meant to be a "short 
course" in statistics but a source of a quick reference, repetition 
or explanation of formulas and concepts, and encourage devel¬ 
opment of statistical tools and routines in the research and rou¬ 
tine laboratories. 
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INTRODUCTION 


Special attention has been given to expressions that can take 
different formats but, of course, give the same results. Exposing 
formulas in different formats may to some extent explain their 
origin, relation to other procedures, and their usage. We have 
tried to align formulas regarding style and terminology and 
group them in a logical order. Some formulas in the collection 
have been edited to facilitate applying in spreadsheet programs. 

The selection of formulas in the compendium has developed 
during several courses in applied statistics for laboratorians 
and scientists with experimental projects. The number of 
worked examples is extensive and regularly enhanced by 
tables and figures. Whenever feasible the text makes reference 
to functions and routines in Microsoft EXCEL®. 

Formulas have been collected and compared from many dif¬ 
ferent sources, scientific literature, common textbooks, and the 
Internet. It is all out there, cast in different forms and shapes 
but may be difficult to find. An idea with this compendium 
is to have most of the statistical procedures used in the labora¬ 
tory collected in one source and described in a standardized 
but not compressed format. 

References to individual sources are not given but a list of 
contemporary literature. 

A threat with preprogrammed routines is that, unless sim¬ 
ple rules are violated and thus prevented from use, they will 
always produce an answer. The process of programming 
and calculating statistical routines has proved to deepen the 
understanding of the procedures and hopefully diminish erro¬ 
neous use of established procedures. However, the author 
takes no responsibility for any erroneous decisions based on 
calculations using formulas in this compendium. 

A comprehensive list of contents and an index facilitate the 
access of the desired concept or procedure. 


VOCABULARY AND CONCEPTS 
IN METROLOGY 


Many organizations have invested heavily in formulating 
internationally acceptable, clear, comprehensive, and under¬ 
standable definitions of terms in metrology. Superficially, this 
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may not seem to have any bearing on statistics. Basically, sta¬ 
tistics is one way of formulating and expressing mathematical 
relationships, but we also need to agree on and use definitions 
of common concepts. The most extensive and internationally 
recognized list of concepts and their definitions is that created 
by the joint BIPM, ISO, IEC, IFCC, IUPAC, IUPAP, OIML, and 
ILAC document International Vocabulary of Metrology—Basic 
and General Concepts and Associated Terms(VIM), downloadable 
at http://www.bipm.org/ (accessed 2013-06-30). 

The definitions are reproduced in extenso from the VIM, but 
some notes have been deleted when pertaining to pure metro¬ 
logical problems. 

The author is grateful for the interest and many excellent sug¬ 
gestions from students and other users of previous editions of 
the compendium. In particular. Professor Elvar Theodorsson, 
Department of Clinical Chemistry, University of Linkoping, 
Sweden, has provided healthy critics. 

Anders Kallner (anders.kallner@ki.se) 



Some Notes on Nomenclature 


Mathematical formulas may be difficult to decipher but are 
in fact unambiguous and comprehensive. 

In this compendium, the formulas are not as compressed as 
they may be and therefore easier to understand. A few rules 
may help: 

The number of items is abbreviated n or N. 

> is read "larger than," < "smaller than," > "larger than or 
equal to," < "smaller than or equal to." 

3> is read "much larger than," <C "much smaller than." 
Fractions (division), a/b; multiplication axb. 
Multiplications in the body of the text are written x, i.e., 
axb. 

Square root: s/a or explicit \fa, which allows for higher 
order roots. 

Sum: fli + fl 2 + « 3 -l-Ffl n is abbreviated: a*. 

Sum of squares: (a/ 2 + (a/) 2 + ( a 3 ) 2 H-b (a,,) 2 is abbreviated 

E n 2 

1=1 ' , . .2 • \ Z 

A squared sum (fli + fl 2 + fl 3 -|-bfl n ) is ( fl/I . 

Absolute value: I a I, i.e., disregarding any sign. 

Standard deviation of a sample x is s(x), s(X), or s if there is 
no risk for misunderstanding. 

Consequently, the standard error of the mean (SEM) is s(x) 
or s(X), as appropriate. The abbreviation SEM is also used. 

The period (full stop) "." is used as the decimal sign and a 
comma "," as the 1000 separator. 

Additional abbreviations as appropriate are explained in 
the text. 
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SOME NOTES ON NOMENCLATURE 


Some Greek letters used for certain purposes (small and capital): 

Alpha: a and A, Beta: [1 and B, Gamma: y and r. Delta: 8 and A, 
Epsilon: e and E, Zeta: 'C and Z, Eta: rj and H, Kappa: k and K, 
Lambda: X and A, My: |i and M, Xi; £, and S, Pi: n and n, Rho: p 
and P, Sigma: a and Z, Tau: x and T, Chi: 7 and X. 



Formulas 


BASICS 


Logarithms and Exponents 

The logarithm of a given number and a given base is the 
power to which the base must be raised to get the number. 

If b is the base and a the given number, the logarithm is x. In 
many applications, the notation “log” refers to 10-logarithms 
(Briggs), i.e., the base 10 and In refers to e-logarithms or 
“natural” logarithms with e = 2.7183 as the base: 


If ( ’log(fl) = x, then anti log(x) = a = b x 


( 1 ) 


thus if e log(fl) = x; ln(fl) = x; then antiln(x) = a = e x (2) 


and 


if 10 log(fl) = x, then anti log(x) = a = 10 x 


(3) 


a x b = c; log (a) + log(fc) = log(c); 

l = c ; lo g(«) - lo g( & ) = lo g( c ) 


(4) 


log (a b ) = b x log(fl) 


(5) 



( 6 ) 
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a 


—n 


1 

a n 


( 7 ) 


1 

a* 


yfa 


1 

n 


X log(fl) 


( 8 ) 


Microsoft EXCEL® commands: Natural logarithm: LN(fl); 
antilog: EXP(LN(fl)) (cf. 2). 

10-logarithms (Briggs) LOG(fl); antilog: 10 LOG ^^ (cf. 3). 
Value of e=e\ EXP(1) = 2.7183; e b : EXP(b). 

Examples 

Let a = 5, b = 10, c = 3, and n = 2, then 

10 log(5) = 0.6990; anti log(0.6990) = 5 = 10 a699 ° 

Since c = 2.7183 and e log(5) —ln(5) = 1.61; anti ln(1.61) = 
5 = e 1 ' 61 = 2.7183 1 ' 61 

5 x 10 = 50; log(5) + log(10) = log(50); 

0.6990 + 1 = 1.6990; anti log(1.6990) = 50 

^ = 0.5; log(5) - log(10) = log(0.5); 

0.6990 - 1 = -0.3010; anti log(0.6990 - 1) = 0.5 


log(5 10 ) = 10 x log(5) = 6.990; = 10 log(5) = 0.6990 

5 -2 = —== log(l) — 2 x log(5) = antilog(0 — 2 x 0.699) 

5 

= anti log(—1.398) = antilog(0.902 — 2) = 0.04 
8 1 / 3 = ^8= 1 log(8) = anti log 0 x 0.9031^ 

= antilog(0.3010) = 2 

Calculation of the logarithms, natural or 10-logaritms is 
directly available in spreadsheet programs. If mathematical 
tables or calculators are used, logarithms are conventionally 
expressed with four decimals to achieve sufficient precision 
for everyday use. Table values can be interpolated. 
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Derivation—Calculus 

The derivative of a function at a given input value describes 
the best linear approximation of the function near that input 
value, i.e., the slope of the tangent in that point. Therefore, if 
the “first derivative” is set to zero and solved, the maximum(s) 
and/or minimum(s) of the function will be obtained. In higher 
dimensions, second, third, etc. derivatives can be calculated 
and if a second derivative is set to zero, the inflexion point 
of the original function is identified. The derivative of a func¬ 
tion/lx) is written dy/dx, \j, or fix) and interpreted as the 
“derivative of y with respect to x.” 

The partial derivative of a function of several variables is its 
derivative with respect to one of those variables while the 
others are held constant. 

The partial derivative is written dy/dx. 

Examples 

The first derivative of a third degree function y = | x 3 — 5x 2 — 
llx — 5 is dy/dx=y' =f(x) = x 2 — lOx —11 with maximum and 
minimum at x = 5±6, i.e., x 1 = — l and x 2 = + ll, respectively. 
The second derivative is dry / dx 2 =y” =f"(x) = 2x —10 and the 
inflexion point of the original function is x = — 5. 

Draw the three functions and confirm the maximum, mini¬ 
mum, and inflexion point! 

If y = n x x k + constant, then a derivative will, in general 
terms, be 

d ft=nxkx x^ (9) 

dx 

For a detailed discussion of derivative rules, derivatives, 
and partial derivatives, the reader is referred to special 
literature. 

TRIGONOMETRY 
Trigonometric Functions 

In a right-angle triangle, i.e., a triangle with one angle equal 
to 90 °, i.e., one side perpendicular to another side, the sides 
surrounding the right angle are called cathetus (a and b in 
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B 


A 


A 




B 


FIGURE 1 (A) Right-angle triangle. (B) The unit circle. 


Figure 1A) and the opposite side the hypotenuse (c). The rela¬ 
tion between these sides is expressed by the Pythagoras' 
theorem: 

a 2 + b 2 = c 2 

The proportions or “image” of any triangle are determined 
by the angles (A = BAC, B =ABC, and C=ACB). The angles can 
be defined by the trigonometric functions referring to a right- 
angle triangle (Figure 1A): 


sin A 

a 

sinB 

b 

cos A 

b 

cos B 

a 

— 5 

c 

c ’ 

c ’ 

c 

tan A 

a 

= b' 

tanB 

b 

a ’ 

cot A 

I cs 

II 

cot B 

a 

= b 


Provided the angle is known and expressed in radians 
EXCEL provides numerical values of these quantities SIN(A), 
COS(A), and TAN(A). The cotangent for an angle is the inverse 
of its tangent and is not available as a separate function 
in EXCEL. 

Radian is defined as the angle AOB in the circle (Figure IB) 
where the arc AB is equal to the radius OB. Since the circum¬ 
ference is 2 x radius x pi ( 71 ) corresponding to 360 °, an angle of 
1 radian will correspond to 360/(2 xn) or 57.3°. 

EXCEL provides conversions between degrees and radians: 
RADIANS (angle in degrees) and DEGREES (angle in radians), 
respectively. Therefore, to express the sine of 30°, the function 
would be SIN(RADIANS(30)) = SIN(0.52) = 0.5. The reverse of 
the trigonometric functions is arcsine, arccosine, and arctangent, 
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respectively. In EXCEL, the functions are ASIN(A), ACOS(A) 
and ATAN(A). Thus, to convert a sine of 0.5 to degrees, the 
function would be DEGREES(ASIN(0.5)). 

Scales—Types of Data 

Data can be expressed on four types or scales of data: nom¬ 
inal, ordinal, interval, and ratio. 

Data on a nominal scale may be numbers or any other infor¬ 
mation that describes a property. There is no size relation 
between the entities. 

Data expressed on an ordinal scale are of different sizes and 
can thus be ordered or ranked. The scale may be arbitrary and 
the intervals between numbers unequal. Data expressed on an 
ordinal scale can be measured and are thus quantities. Not all 
statistical procedures can be applied to ordinal data. Examples 
maybe “good,” “excellent,” and “superior,” or +1, +2, +3 etc. 
with no defined difference between the results. 

Data with equal intervals between numbers are of two kinds 
and can be expressed on an interval scale and a ratio scale. The 
ratio scale is characterized by—apart from equally sized 
units—a natural zero value, whereas the interval scale may 
have an arbitrarily defined zero. A commonly cited quantity 
that is expressed on an interval scale is temperature expressed 
as degrees Celsius or Fahrenheit whereas if expressed in Kel¬ 
vin a ratio scale is used. Consequently, 40 K is twice as much 
as 20 °C, whereas 40 °C is not twice as much as 20 °C. How¬ 
ever, there are as many degrees between 40 and 20 °C as 
between 20 and 0 °C. 

DISTRIBUTIONS OF DATA 


Histogram 

A histogram displays the number of data points in each of 
defined categories or intervals—often called “bins.” It is a 
rough representation of the frequency probability distribution 
of data. The resolution and details of the distribution depend 
largely on the size and number of bins. Usually, the bin sizes 
are made equal in the interesting interval but that is not always 
the case. Designing a histogram manually is easy, but tedious 
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and EXCEL offers two different possibilities. The simper is to 
activate the “Data analysis” function which is an add-in to the 
program and found under the Data tab. This is straightforward 
and allows an individual design of the bins as an option to 
those calculated by the program. The routine has the disad¬ 
vantage of not allowing modifications interactively. A fully 
flexible procedure is obtained by the “frequency” function. 
This is an “array” function. 

In short, define the desired bins, mark a set of empty cells, 
one cell more than the number of bins and write in the first 
cell =FREQUENCY(A2:AN 2 , B1:BN 2 ) and press Control + 
Shift + Enter. The array is then created in the marked cells 
which are filled with the copied formula and subsequently 
with the number of items in each bin. The array formula will 
be the same in each of the marked cells. The bar graph can now 
be displayed. Any changes in the bins (B1:BN 2 ) or the data set 
(Al:ANi) will immediately be reflected in the histogram. 


The Normal and t-Distributions 

The general concept is “probability density function.” A spe¬ 
cial form, the Gauss distribution, is a symmetrical distribution 
around the most probable or frequent value, the mean, and a 
defined variability, the standard deviation. This distribution 
occurs if data are randomly distributed. 

Gauss (normal) distribution: 

G M X = ^ xe - (l '-") 2/2ff2 ( 10 ) 

(7 V 271 

The distribution is fully defined by two distribution-related 
constants, the population mean, / 1 , and the standard deviation, 
(j, and is graphically represented by the well-known bell¬ 
shaped curve, residing on a horizontal value axis (X-axis) and 
the frequency of observations on a vertical Y-axis. The peak 
of the curve is the average (16) of all observations belonging 
to the population and the variation or width of the distribution 
is related to the standard deviation (20). The standard deviation 
(s(x)) is a quantity value on the X-axis and formally obtained 
by solving the second derivative of the Gauss function when 
set equal to zero, i.e., an inflexion point of the function. Also 
see Figure 3 for an understanding of the standard deviation. 
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TABLE 1 The Relation between the Probability and the z-Value 


Remaining Remaining 


Cumulative 

AUC 

(Probability) 

z- 

Value 

(s(x)) 

above the 
z-Value 
(1 —AUC) 

Cumulative 

AUC 

(Probability) 

z- 

Value 

(s(x)) 

above tl 
z-Value 
(1 —AU( 

0.500 

0.00 

0.500 

0.99000 

2.326 

0.01000 

0.841 

1.00 

0.159 

0.99900 

3.090 

0.00100 

0.975 

1.96 

0.025 

0.99990 

3.719 

0.00010 

0.977 

2.00 

0.023 

0.99999 

4.265 

0.00001 


Consequently, mean +1 s(x) represents a probability increase of (0.841 — 0.500) — 0.341, “the second 
s(x)” (0.977 —0.841) —0.136, and the mean±2 x s(r)-2x (0.977 —0.500) —0.950. 3 s(.r) is usually 
approximated to a probability of 99.9% and 4 s(r) to 99.99%. 


The area under the curve (AUC) in the interval —2 x s(x) to 
+2 x s(x) is about 97.5 % (see Table 1). 

In the discussion of the properties of the normal distribu¬ 
tion, a “standard normal distribution” is often used. This is 
characterized by a mean or average of 0 and a standard devi¬ 
ation of 1. Formula (10) is then simplified to 

G 0 . iX = —L x e“^ 2 / 2 (10A) 

\/2n 

Although the shape is the same, the distribution's width and 
height may vary, also in relation to each other. 

The cumulated AUC from —oo to z (a given number of stan¬ 
dard deviations) is the cumulative probability (Table 1). This is 
coded as in EXCEL as NORMS.DIST(z,TRUE). NORM.DIST 
(x,mean,standdev,TRUE) will calculate the same but for any 
normal distribution. 

In EXCEL, NORM.S.DIST(z,FALSE) and NORM.DIST(x, 
mean,standdev,FALSE) give the value of the normal distribution 
at the given value of z or x, respectively. These functions 
can be used to display a standard normal distribution curve 
( NORM.S.DIST) or any normal distribution ( NORM.DIST ) 
in EXCEL. 

Calculation of the standard deviation (z-value) at which a cer¬ 
tain probability (AUC) is reached is coded as NORM.S.INV 
(probability) for a standard normal distribution and NORM. 
INV(probability,mean,standard deviation) for the general case. 
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Skewness 

This is a measure of the asymmetry of the probability distri¬ 
bution of a random variable. The skewness can be expressed 
numerically and is usually displayed in statistics software. 
A positive skewness indicates that the “tail” of the distribution 
is extended toward higher values (a “right skewness”) and the 
majority of observations are found at lower values. Likewise, a 
negative skewness indicates a “left” skewness. 

A normal distribution of a continuous variable has a skew¬ 
ness of 0. 

There are many formulas to numerically estimate the skew¬ 
ness. Statistically, it was defined in terms of the second and 
third moment about the mean (Pearson) as 


8 = 


1 t 

- x > .Ay 

yt / -Jl =1 v 


l / 

- x > . Axi- 

yi / ^1=1 v 


x) 2 


3/2 


This formula can be expanded to compensate for the sample 
size: 


G = V n x (n - 1) x n 


(n - 2) 


1 , -\3 

1 / -\2 
n X 2^iJ Xi ~ X) 


(n — 1) x (n — 2) 


e:u 


—\2' 


(Xi - x) 


( 11 ) 


which is the formula used in EXCEL SKEW(cell A:cell B). There 
are tables of critical values of G for normal distributions. 

By convention the skewness is interpreted as 

• less than —1 or greater than +1: highly skewed. 

• between — 1 and — 1 / 2 or between +1/2 and +1: moderately 
skewed. 

• between —1/2 and +1/2: approximately symmetric. 

There are different shortcuts to estimate the skewness. A 
crude estimate is the Bowley skewness or Quartile skewness. 
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which gives an indication of the skewness. It compares the 
distances of the quartiles (25 and 75 percentiles, p(0.25) and 
p(0.75)) from the median p(0.50): 


s _ q 3 + qi _ p(0-25) + p(0.75) - 2 x p(0.50) 
q 3 - qi p(0.75) - p(0.25) 


( 12 ) 


where q 3 is the distance between the p(0.75) and median and q 3 
the distance between p(0.25) and the median. In this formula, 
—1<S< + 1. In a symmetrical distribution, when the p(0.25) 
and p(0.75) are equal, S = 0. 

S = ±0.1 is regarded as a moderate skewness, whereas ±0.3 
is very noticeable. 

Since only the middle two quartiles of the distribution are 
considered, and the outer two quartiles are ignored, this adds 
robustness to the measure but is also a caveat. 

The choice of quartiles as the limits is arbitrary, and other 
pairs could be justified as well, e.g., p(0.05) and p(0.95). 

The Pearson skewness is 


S 


x — mode 
s 


(13) 


and the Pearson second skewness 


Si 


x — median 

s 


(14) 


The “second skewness” may be more straightforward to 
estimate since the median can easily be calculated. The inter¬ 
pretation is to express the difference between the mean and the 
median in terms of the standard deviation estimated as if the 
distribution were normal. 


Example 

The concentration of many components of blood is normally 
distributed (e.g., S-Calcium) but there are also many which 
distribution is skew, often positively, e.g., the concentrations 
of S-Triglyceride and S-Lp(fl). 

Suppose we have a data set that is normally distributed with 
a mean of zero [0] and a standard deviation of 1. Characteristi¬ 
cally, the median (55), mean (16), and mode are equal. Suppose 
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further that we add a number of results that are much larger 
than the mean above the mean and an equal number below, 
close to the mean. This causes the mean to move to a higher 
value and the standard deviation to increase, whereas the 
median remains the same. Consequently, the Pearson skew¬ 
ness (14) turns positive. 

The “standard” deviation of a skew distribution can no lon¬ 
ger be interpreted as 34 % of the distribution. 

The “Pearson skewness index” is 

3 x (mean — median) 

bk =--- (15) 

s 

An extensive example of skewness is given below. 

Kurtois 

This is a measure of the “peakedness” of the probability 
distribution of a real-value random variable. The kurtosis 
can be expressed numerically. Positive kurtosis (leptokurtic) 
indicates more peakedness than predicted by a normal dis¬ 
tribution, and negative kurtosis (platykurtic) indicates less 
peakedness than a normal distribution. Zero [0] would indi¬ 
cate that the peakedness was as expected for a normal distri¬ 
bution. For instance, a rectangular distribution (83) has a 
kurtosis of [—1.2]. 


The t-Distribution 

When the number of observations is low, the calculation of 
the normal distribution and parameters derived should be 
compensated for the low number. The compensation is in 
the calculation of the standard deviation by introducing 
the “degrees of freedom ( df )” in the denominator of the calcu¬ 
lation of the standard deviation and the population and 
sample standard deviation can be identified. See formulas (22) 
and (28). The f-distribution has the same bell-shaped symmet¬ 
rical form as the normal distribution but is wider, the 
exact shape depending on the df (21). The t-distribution 
approaches that of the normal when the number of observa¬ 
tions increases. 
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Transformation of Distributions 

Many statistical evaluations require that the distribution 
of the data is, or is close to, normal. A data set can sometimes 
be transformed to approach a normal distribution, by recalcu¬ 
lating the quantity values to logarithms, the square roots, or 
reciprocal values. These techniques reduce large values more 
than small values and positively skewed data sets may come 
close to normal. In general terms, reciprocal (1 / xi) has a stron¬ 
ger effect on the skewness than the logarithmic transformation 
and the square root a weaker effect (see Table 2). 

It is necessary to reestablish the original values before present¬ 
ing the mean or variation of the original data set. The variation 
will usually no more be symmetrical around the mean value. 

Transformation of skew distributions may have benefits as 
well as costs, e.g., loss of information. 


TABLE 2 Descriptive Statistics of the Original and Transformed Distributions 



Orig 

Ln 

Sq Root 

Mean (average) 

2.57 

0.87 

1.57 

Median 

2.20 

0.79 

1.48 

Standard deviation 

1.08 

0.38 

0.32 

Standard error of 
the mean (SEM) 

0.12 

0.04 

0.03 

Mode 

1.70 

0.53 

1.30 

Percentile 75 

3.10 

1.13 

1.76 

Percentile 25 

1.77 

0.57 

1.33 


Orig 

Antiln 

Squared 

Mean (average) 

2.57 

2.38 

2.47 

Median 

2.20 

2.20 

2.20 

Mean — s 

1.49 

1.62 

1.58 

Mean+s 

3.66 

3.50 

3.57 

Cl (±SEM, n = 84) 

2.45-2.69 

2.29- 

2.48 

2.37-2.58 

Mode 

1.70 

1.70 

1.70 

Percentile 75 

3.10 

3.10 

3.10 

Percentile 25 

1.77 

1.77 

1.77 
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Example 

The concentration of a series of 84 patient samples was 
measured. 

Two data transformations were tried, logarithmic and 
square root (Figure 2). Both transformations resulted in a 
reduction of the skewness. In the lower panel of the table, 
the descriptive statistics of the transformed distributions have 
been transformed back to the original format. The means have 
shifted, and the confidence interval (Cl) is no more symmetri¬ 
cal around the estimated mean; the nonparametric quantities 
(i.e., median, percentiles) are unchanged. As illustrated in 
Table 3, transforming using the reciprocal values is more 
powerful, i.e., reducing large values (above the median) pro¬ 
portionally more than small values, than the logarithmic trans¬ 
formation and the square root transformation less so. 

Definitions and Metrics 

In the following definitions, 

Xj represents results of an observation and 
n the number of observations. 

Thus x„ is the nth observation of the series. 




FIGURE 2 The appearance of a skew frequency distribution (left) and after 
transformations. The display of the distribution is related to the choice of bin 
sizes, whereas the skewness indicators are independent. 


TABLE 3 Comparison of Skewness Estimates 



Orig 

Ln 

Sq Root 

Reciproc 

Skew (EXCEL) 

1.16 

0.49 

0.83 

0.22 

Skew (Bowley) 

0.36 

0.23 

0.29 

0.09 

Skew (Pearson-2) 

0.34 

0.21 

0.28 

0.03 
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Mean, arithmetic: 



X\ ± x 2 ± ' ' ' ± x n 

n 


(16) 


n 


The mean or average answers the question: If all quantities 
had the same value, what would that be to achieve the same total 
sum? 

Example 

The mean of observations x 1 =3, x 2 = 4, x 3 = 7, and x 4 = 10 



6. The sum of6 + 6 + 6 + 6 = 24. 


n 4 


The x is known as the sample mean or average, whereas the 
population mean or average is /i. 

Mean, geometric: 



(17) 


The geometric mean answers the question: If all quantities 
had the same value, what woidd that be to achieve the same product? 
In mathematical terms: “The nth root of the product of n 
numbers.” 

Example 

The geometric mean of the above example is 


xq = ±</(3) x (4) x (7) x (10) = ±^840 = ±5.38 

This expression may be more conveniently calculated using 
logarithms (l)-(8): 


ln(xc) = |(ln(3) ± ln(4) + ln(7) + ln(10)) = 


| (1.10 + 1.39 + 1.94 + 2.30) = 1.68 
antiln(1.68) = 5.38 


The product 5.38 x 5.38 x 5.38 x 5.38 = 5.38 4 = 840. 
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Note If the product of the results is negative, there must be 
an odd number of observations to give an interpretable result. 

A consequence is that the antilogarithm of the arithmetic 
mean of a logarithmic distribution will be the geometric mean 
of the underlying distribution. 

Mean, harmonic: 

xh = - . n (18) 

e;:;© 


Example 

Suppose you run the first half of a marathon in 5.00 km/h 
and the second in 9.00 km/h. The time would be the 
same (6.54 h) as if you had run the whole distance with 
6.42 km/h; the harmonic mean 2/(l/5+l/9). If you ran 
5.00 km/h for 3.27 h and 9.00 km/h for 3.27 h the average 
speed would be 7.00 km/h. 

The harmonic mean will reduce the influence of extreme, 
large values but increase that of small values. The harmonic 
mean is equivalent to the inverse mean of the reciprocals of 
the values and a conveniently estimated mean of rates. 

The harmonic mean is always the smallest of the three geo¬ 
metric, arithmetic, and harmonic means, the arithmetic always 
the largest and the geometric in between. 

Mean, weighted: 


e ::©*"0 e ::; 


i=N 


Xjn — 


N 


N 


(19) 


where x, is the value of the quantity in each of the k bins, n k is 
the number of observations in each of the corresponding bins, 
and N is the total number of observations, i.e., N= Ci «;• 


Example 

A data set consists of five groups with 3, 3, 6, 2, and 7 items, 
n = 21. The means of the groups were 5, 9, 3, 8, and 4, 
respectively: 
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3x 5 + 3 x 9 + 6x34-2 x 8 + 7x4 104 „ 

=-=-= 4 95 

3+3+6+2+7 21 

Standard deviation, sample: 


s = 



( 20 ) 


The standard deviation has the same dimension and unit as 
the measured quantity. In the graphical representation of the 
normal distribution—the familiar bell-shaped curve—it is 
the distance between the peak (arithmetic mean or average) 
and the first inflexion point where an increasing negative slope 
(above the mean) changes and begins decreasing towards hor¬ 
izontal. Since the function is symmetrical around the mean, the 
distance to the inflexion point on the other side of the mean is 
the same. The standard deviation is thus the positive second 
derivative of the normal frequency distribution (Gaussian 
distribution). 


Note The standard deviation is the square root of a number 
and is always the positive root. If the standard deviation is 
used to describe, e.g., the interval where 95 % of observations 
of normally distributed data are expected, it is + ± 1.96 x s(x). 



FIGURE 3 Frequency (hatched) and cumulative frequency (solid). z-Values 
are the number of standard deviations. The vertical dotted line represents 
average. The dotted horizontal line crosses the median (cumulative frequency 
0.5) and intersects the gauss curve at its inflexion points, equal to the standard 
deviation. 
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TABLE 4 Frequency and Cumulative Frequency Mean = 0, S = 1 


Probability 

z-Value 

Cumul Area 

Frequency 

0.001 

-3.00 

0.001 

0.004 

0.023 

-2.00 

0.023 

0.054 

0.159 

-1.00 

0.159 

0.242 

0.309 

-0.50 

0.309 

0.352 

0.500 

0.00 

0.500 

0.399 

0.691 

0.50 

0.691 

0.352 

0.841 

1.00 

0.841 

0.242 

0.975 

1.96 

0.975 

0.058 

0.977 

2.00 

0.977 

0.054 

0.999 

3.00 

0.999 

0.004 

In EXCEL, 

the function is 

STDEV(cellAxellB). 



The “sum of squared” values appear in many statistical 
calculations (e.g., (20)). This means that the individual values 
are squared and the added together. Conveniently, there is a 
special function in EXCEL: SUMSQ(cellAxellB). 

The standard deviation represents an interval on the X-axis 
and is expressed in the same unit as the quantity values. Due to 
the shape of the normal distribution curve, the “first standard 
deviation” counted from the mean will cover about 34.1 % of 
the AUC, the next only about 13.6 %, i.e., together about 
48 %. The “third s” will cover only about 2.1 % of AUC. There¬ 
fore, the interval from —2 s to +2 s includes about 95.4 %, 
leaving about 2.5 % below —2 s and 2.5 % above +2 s 
(Tables 1 and 4). The z-values in Table 4 were calculated by 
EXCEL function NORMSINV(probability). The cumulative 
area and frequency distribution are obtained by NORMS. 
DIST(zJRUE) and NORM.S.DIST(z,FALSE), respectively. 

df for sample standard deviation: 

df = n -1 (21) 

The subtraction of one [1] from the number of observations 
(n) in the calculations of the standard deviation can be 
explained by the data set being used once already to calculate 
the mean, thus losing one df. In general, the df are calculated as 
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the sample size minus the number of estimated parameters. 
Therefore, the “corrected” variance is known as an unbiased 
estimator of the population variance; this correction is known 
as the Bessel correction. 


To draw normal distribution curves in EXCEL 
There are many ways to create a normal distribution curve 
in EXCEL. By defining the mean and standard deviation and a 
list of observations, the formula (10) can be used to calculate 
the frequency at each defined quantity value. 

The frequency can also very conveniently be calculated 
using the function NORMDIST(x,mean,s(x),FALSE). The func¬ 
tion NORM.S.DIST(z,FALSE) will give the standard normal 
distribution, i.e., mean 0 and standard deviation 1. Exchanging 
“FALSE” for “TRUE” will give the cumulative frequency. 

The functions NORMINV and NORM.S.INV will analo¬ 
gously calculate the frequency from the cumulative normal 
and standard normal distributions, respectively. 


Standard deviation, short cut: 






Spi= n 2 (j2,=l Xl ) 

»* - (xiia.) 


Li=l ' n 

s ~\ 

(n-l) \ 

n x (n — 1) 


( 22 ) 

This formula will give identical results as Equation (20) but 
has the mathematical advantage of not calculating the individ¬ 
ual differences between the observations and the mean which 
reduces possible rounding errors. It is advantageous to use in 
programming since calculation of the mean is not necessary. 
On the other hand, it involves the squares of X; and the final 
difference must be calculated with a sufficient number of value 
digits to avoid undue rounding errors. 


Variance: s 2 



n — 1 


(23) 


The variance occurs in many statistical calculations, e.g., 
propagation of errors and uncertainties and analysis of vari¬ 
ance (ANOVA). 
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Example 

Suppose we have a set of results 1,2,3,4,5,6,7,8,9. Calculate 
the standard deviation using formulas (20) and (22)! 

The average is 45/9 = 5. The degrees of freedom 9 — 1 = 8. 

Formula (20) yields,J/(x; — x) 2 = 60. The standard deviation 
is y/60j8 = 2.74. 

The (22) generates: The sum of squared observations 
)Cx 2 = 285, the square of the sum of the observations 
QC*) 2 = 2,025, thus the QCx) 2 /(9) = 225. The standard devi¬ 
ation is ^(285-225)/8 = 2.74 

NB using EXCEL SUMSQ(1,2.. .8,9) = 285 

The Cl of the standard deviation of the population is 


(n — 1) x s 2 

7(a/2, (n— 1)) 


= S X 


= S X 


/ in-1) 

I y 2 

^(a/2, (n— 1)) 

I in - 1 ) 

7(1—a/2, (n—1)) 


to 


in — 1) x s 2 
7(1—a/2, (n— 1)) 


(24) 


Example 

In an experiment, 20 results were obtained with an average 
of 5 g and a standard deviation of 0.7 g. Calculate the 95 % Cl 
for the standard deviation! 

First find the y 2 (chi-square) values for the endpoints of 
the Cl, i.e., 0.975 and 0.025 for the lower and higher limits, 
respectively. Remember that the y 2 table and EXCEL com¬ 
mand CHIlNV(y.,df) give the part of the distribution to the right 
(above) of the limit. Thus, for df=(n — 1), i.e., 19, 

7?a/2,(«-i)) = 32 - 9 ; 7(i_a/ 2 , (n— l)) = 8 - 9 and accordingly, the Cl 
equals 0.53-1.02 g. 


sx, 


/ jn-1) 

7(a/2, (n— 1)) 


= 0.7 x, 


0.7 x 


19 


CHIIN 1/(0.025,19) 

I ^ 

J CHIINV(0.975,19) 


to sx, 


7(1—a/2, (n—1)) 





















DISTRIBUTIONS OF DATA 19 



3.0 

1 

e M 

i 

i 

i” 

i.» 

s 

1 ... 
| 05 

Jpper and lower confidence limits (95%) 

- 


5 10 15 20 25 

Number of observations 


FIGURE 4 A set of 25 consecutive values were drawn from a normally dis¬ 
tributed data set with a mean of 0 and standard deviation of 1. The mean and 
standard deviation were calculated from the first two, the first three, etc. 
values of the selection. As indicated, the estimated mean is comparatively 
stable and close to the target after about 10 observations, whereas the stan¬ 
dard deviation levels out after about 20 observations. Note the Cl is not sym¬ 
metrical around the estimated standard deviation (middle panel). The 
rightmost panel shows the theoretical confidence interval of the standard 
deviation according to the formula (24). 


Note The Cl is not symmetrical around the standard 
deviation. 

As illustrated in Figure 4, the Cl of the standard deviation is 
very large if based on few observations. The same tendency is 
illustrated in a simulated example, whereas the Cl of the mean 
includes the target value with a few observations. 

The standard deviation according to Equation (20) underes¬ 
timates s at low df, on an average, whereas the corresponding 
variance is correct. The reason is that taking the square root of 
the variances to compute the s numerically, reduces large var¬ 
iances more than small and therefore the mean of the s will 
underestimate the true population standard deviation ( a). This 
is particularly noticeable at small df. It can be corrected by the 
c4-correction. This is rarely applied mainly because the esti¬ 
mated variance is not liable to the same problem and the var¬ 
iance is used in Student's test and ANOVA, which are thus not 
affected. 

The average underestimation of s is about 20 % with 2 
observations, 8 % with 4 observations, and about 1 % with 
25-26 observations. The c4 correction can be calculated using 
EXCEL: 
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c4 = EXP(LN(SQRT(2/(N - 1))) + GAMMALN(N /2) 

- GAMMALN((N - 1)/2)) (25) 


Number 

of Obser¬ 
vations 

2 

3 

4 

5 

10 

25 

26 

100 

c4 

0.7979 

0.8862 

0.9213 

0.9400 

0.9727 

0.9896 

0.9901 

0.9975 


The correction is achieved by dividing the estimated s(x) by 
the appropriate table value. 

Standard deviation, population: 



The sample standard deviation (20) approaches the s p if the 
number of observations is large and equals the root mean 
square (27) if x is zero [0]. 

If the mean is set to zero, by definition (e.g., evaluating 
differences of quantity values in a quality control experiment 
or in a regular wave function, e.g., the sinus function), the df 
is equal to the number of observations since no mean has been 
calculated. 

s p is also known as the biased population standard deviation. 

Root mean square (mean, quadratic; RMS): 


x 


4 ~ 



(27) 


The RMS is a measure of the magnitude of a varying quantity. 
It is especially used when variables are positive and negative, 
e.g., in describing a sinus wave. It corresponds to the standard 
deviation of a number of observations with a mean of zero [0] 
(cf. 26). 

The relation between the quadratic mean and the average 
of a set of normally distributed values with a population 
standard deviation of s p (x) (26) is 



( 28 ) 
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The RMS of the error of a measurement (A) (RiLiBAEK IQC 
rules in Germany). 

The root mean square is used in quality management as a 
metric that addresses the imprecision and bias. The root mean 
square of the deviation of a measurement is 


A = 




(29) 


where x* L is the observations and x 0 a “true value.” 

If the mean (16) of the observations and their 
standard deviation (20) are (x) and (s), respectively and the 
systematic deviation (x — Xo) = 3 then, since (x, — x 0 ) 2 = 

E ji m 2 ^ 

; _ 1 (xj — x) + n(x — xo) (29) can be rewritten as 


A = 


O - ^ 2 _ a - ^) 2 +»(* - *°) 2 


l(n — l)s 2 


+ ^ 2 


(30) 


Standard deviation, relative (coefficient of variation): 

CV = | (31) 

x 

Standard deviation, relative, percent: 


%CV 


100 x s 
x 


(32) 


The coefficient of variation, percent, is often abbreviated 
%CV. 

Example 

A series of results were 3,3,6,2, 7,5,9,3,8, and 4. The mean 
is 5 and the standard deviation 2.40. The %CV is then 
100x2.401/5 = 48.1 %. 

Note The relative standard deviation (RSD) and the 
coefficient of variation are always (by definitions 31 and 32) 
calculated from one [1] standard deviation. 
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Standard deviation, pooled: 


i 


(«1 - 1) x sf + («2 - 1) X s\ H - b {n k - 1) x s; 


(«i + «2 H-b n k ) - k 


,2 

'k 



(33) 


The pooled standard deviation is used to find an estimate of 
the population standard deviation given several different sam¬ 
ples and their number and standard deviation measured 
under different conditions. The mean of the different series 
of measurements may vary between samples, but the standard 
deviation (imprecision) should remain almost the same. Pool¬ 
ing standard deviations may provide an improved estimate of 
the imprecision. 

The pooled standard deviation can be used to describe the 
total performance of a laboratory which uses many instru¬ 
ments to measure the concentration of the same analyte. 

Example 

A laboratory used three instruments to measure the concen¬ 
tration of the same analyte. The first instrument measured 
12 samples with a standard deviation (s) of 0.35, the second 
3 samples with s = 0.55, and the third instrument measured 7 
samples with s = 0.40. Estimate the pooled standard deviation, 
representing that for samples analyzed on a random choice of 
instruments. 



(12 - 1) x 0.35 2 + (3 - 1) x 0.55 2 + (7 - 1) x 0.40 2 
(12 + 3 + 7) -3 



Squared, multiplied by (N — k ), and applied to repeated 
measurements of the same material, this is identical to 
the “within group” mean square of the ANOVA; compare 
the formula for within-series sum of squares in ANOVA 
analysis, SS Z „ (105). 
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If the number of observations in each group is the same 
( n i =n 2 =- ■ ■ =rii), then 



(34) 


i.e., the s pool is the square root of the average variance of the 
groups. 

If the imprecision is not constant but proportional to the 
measured quantity, it is reasonable to assume that there is a 
constant RSD, characterizing the imprecision. The calculated 
RSD can be calculated from the results of a series of measure¬ 
ments as 



(35) 


If there are only two observations in each group, then 
Equation (33) can be rearranged to 
Standard deviation (Dahlberg): 



(36) 


where d is the difference between results of duplicate measure¬ 
ments, N is the number of pairs (duplicates). The standard 
deviation estimated by the “Dahlberg” formula assumes that 
the standard deviation is constant in the measuring interval 
(homoscedastic). In any case, it will produce a value that con¬ 
siders all data pairs. 

To estimate the RSD from duplicate observations, using the 
“Dahlberg formula,” the relative difference between each pair 
of observations is calculated: 



(37) 
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Example 

A series of samples were measured in duplicates: 3.6 —4.0; 
10.3 — 9.8; 5.2 —5.9; 4.6 —4.2; and 7.2 —7.6. Calculate the differ¬ 
ences between the duplicates: —0.4; 0.5; —0.7; 0.4; and —0.4. 

The sum of the squared differences d\ = 1.22; Si = = 0.35. 

Similarly, the RSD: 


RSD 


N 



/ 0.5 \ 2 /-0.7\ 2 

1,10.75/ + \555j + 
2x5 


0.040 

2x5 


0.063 or 6.3% 




2 


The %CV calculated from the Si, with the mean of 6.24, the 
% RSD = 5.6%. 

Standard deviation, geometric is obtained by calculation of the 
standard deviation from logarithmically transformed mea¬ 
surement results Xg and s g , respectively. The logarithmic Cl 
is thus as follows: 

CL = Xg±z x —(38) 
s s Vn 


To get the quantity values the antilogarithms of x g and s g 
must be calculated. 

Standard error of the mean, SEM: 


SEM = s(x) 



(39) 


The standard error of the mean expresses the interval within 
which a repeated estimate of the mean is assumed to occur 
with a given probability. The metric thus expresses how well 
a mean is estimated. This is different from the standard devi¬ 
ation (s) which describes the width of the distribution and thus 
participates in the definition of the normal distribution. The 
standard error of the mean is also abbreviated SEM = s(x), 
and with the same logics, the standard deviation is specified 
and abbreviated s = s(x). 

Confidence interval, Cl: 

_ g 

Cl Xi f(i_ a ., 2 _q x — 
v« 


(40) 
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The magnitude of the f-value depends on the degrees of free¬ 
dom ( df=n — 1) and the level of confidence (1 —a). The f-value 
should be taken from a Student's f-value table or calculated in 
EXCEL by the function TINV(1 — ot,df). For large numbers of 
observations (often suggested >30), the f-value can be obtained 
from a “z-table” or by EXCEL: NORMSINV(l-oc), i.e., the 
f-distribution approaches the normal distribution. 

Thus, for large numbers f(i_ a/ „_i)«Z(i_ a ). 

Proportions 

The binominal distribution describes the distribution of 
values that can only have two outcomes, e.g., healthy- 
nonhealthy or heads and tails in flipping a coin. 

If the total number of observations (trials) is n and the 
number of successes (hits, positive findings, etc.) is r, then 
the probability (0<p<l) or proportion of successes in each 
independent trial is 


r 


(41) 


V 


n 


The variance of the binominal distribution is [p x (1 —p)] and 
thus the 

Standard deviation of a proportion (p/(l —p)): 



(42) 


The frequency distribution of a proportion is the binominal 
distribution, but usually, the normal approximation to the nor¬ 
mal distribution can be used. This requires a sufficient number 
of observations expressed and sufficient in this case requires 
that both n x p and n x (1 —p) exceed 5 which is equal to requir¬ 
ing that r and n — r are above 5. 

Standard error of a ratio (proportion [p]) 



(43) 


where p is the proportion and n is the number of observations, 
i.e., the sample size. 
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The Cl 


CI(p) = z x 


p X (1 — p) 


(44) 


The above method is often referred to as the traditional 
method to estimate s(p). It is not applicable if the proportion 
is large or small; often limits of 0.1 and 0.9 are specified. It is 
thus often inappropriate to use the traditional method for esti¬ 
mating the standard error and the Cl of diagnostic sensitivity, 
diagnostic specificity, and prevalence of disease, where pro¬ 
portion below and above the limits, respectively, is common. 
Instead, the following procedure is recommended, often 
referred to as Wilson's method. 

If r is the number of observations that refer to a parti¬ 
cular property (e.g., testing positive in an investigation among 
diseased), TP in a sample of n observations (e.g., diseased), then 


p = - and q = 1 — p 
n 


(45) 


Calculate: 

A = 2 x r + z 2 


(46) 


B = z x yz 2 + 4xrx(| 

C = 2 x (n + z 2 ) 

where z is 1.96 to correspond to a 95 % CL 
Then, the Cl is as follows: 

A-B A+B 
—^— to —-— 


(47) 

(48) 


(49) 


Example 

In a group of 85 ( n ) known diseased, only 5 tested positive 
(r) with a new test. Estimate the proportion (p) of true positives 
(TP = sensitivity) and its Cl using traditional and alternative 
methods! 
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Thus, n = 85 and r = 5 and p = ^ = 0.059; nx(l— p) = 80; 
n — r= 80: 


A = 2 x 5 + 
B = 1.96 x 


1.96 2 = 13.8 


1.96 2 ± 4 x 5 x 

M) 


1 

C = 2 x (85 + 1.96 2 ) = 177.7 
and 


13 8 -” = 0.025 ,0 13 ' 8 + 93 = 0.130 


177.7 


177.7 


Thus, the proportion is 6 % with a Cl of 2.5-13 %. 

Note The Cl is not symmetrical around the proportion. 

The Cl estimated according to the “traditional method” (44) 

is ±1.96 x y 4g x (^~8g) _ 0.052, thus from 0.007 to 0.11 or 0.7 % 
to 11 %. 

At low proportions, the Cl with this approach may 
produce impossible negative numbers of the probability. Try 
a positive diagnostic sensitivity of 3 %! (The confidence limit 
is negative for p<3.7 and larger than 1, for p> 96.3). 

Uncertainty of the difference between two proportions 
The general rule of error propagation is applicable: 


u(Pi ~ Pi) = \/( s (Pi)) 2 + ( S (P 2 )) 2 = 

< 5 °> 

z-score: 


The z-score thus positions an observation in relation to 
the mean and the distribution; the position is expressed in stan¬ 
dard deviations. The 95 % confidence limit would thus be 
expressed as ±1.96 z. The z-score is often used to express the 
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performance of measurements by different procedures and 
different quantity values. The probability that a value should 
be outside (larger than) the z-score is 1 —NORMSDIST(z). 
For a two-sided evaluation, use 2 x (1 — NORMSDIST(z)) or a 
normal t- table. 


Poisson Distribution 

This describes independent and random occurrences of 
events, e.g., radioactive decays per time unit [frequency] (dpm). 

If n is the number of events per time unit (e.g., dpm) and T is 
the number of time units the total number of occurrences is 
Number of events: 

p = n x T (52) 


Standard deviation of the Poisson distribution: 

±yffl (53) 


Coefficient of variation of the Poisson distribution: 



h y/Ji 


(54) 


Example 

The activity of radon was found to be 340 Becquerel/m 3 
(1 Becquerel = l disintegration/s). The standard deviation is 
±18 Becquerel/m’ and the RSD 0.05 or %CV = 5 %. 
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Median (50-Percentile) 

The number that separates the higher half of a sample, a 
population, or a probability distribution from the lower half 
represents the median. The median of a finite list of numbers 
can be found by arranging all the observations from the lowest 
value to the highest value and picking the middle one. If there 
is an even number of observations, the median is not unique, 
so one often takes the average of the two middle values. 

In a series of n odd-ordered numbers: 

(n+l) x0.5 


median = x. 


(55) 
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In a series of n even-ordered numbers: 
median = X{nx03) + * ( " xa5+1) (56) 


Mode 

The mode of a discrete probability distribution is the value x 
at which its probability function takes its maximum value or 
peak. In other words, it is the value that is most likely to be 
sampled. A density function may have several peaks and is 
then referred to a multimodal in contrast to unimodal. 

In a normal distribution the mode, median and mean 
coincide. 

Percentiles 

There is no universally accepted definition of a percentile 
and statistical software packages may use different techniques 
for its calculation. The problem is that of definition and round¬ 
ing: Thus, the percentile can either be defined as the lowest 
score that is above the percentage looked for or the smallest 
score that is greater than—or equal to—the percentage. This 
problem is particularly important if the number of observa¬ 
tions is small. 

EXCEL: PERCENTILE(interval,p) where p is the percentile, 
e.g., 0.75 for the 75 percentile. 


Example 

The following procedure calculates the score of the value 
nearest below the percentile and interpolates in the interval 
between the numbers closest to the percentile. Consider, as 
an example, a series of seven numbers («), ordered and 
(ranked [R]): 3(1), 5(2), 7(3), 8(4), 11(5), 13(6), and 20(7). 
Estimate the 35th (p) percentile! 


100 x R 
n +1 


(57) 


and thus R = x (n + 1) = — x8 = 2.8. Accordingly, the 

35th percentile corresponds to the rank R 2.8 and the numerical 
value corresponding to the rank shall now be calculated: 
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The p percentile is obtained by interpolation and is the value 
corresponding to 

Integer R + Fraction R x (difference between value 
scored R and R + 1) (58) 

In our example, Integer R is 2 and the corresponding 
value 5. 

The next scored value is 7 and the Fraction R is 0.8. Thus, 

P 0.35 = 5 + 0.8 x (7 - 5) = 6.16 

This procedure will also give the correct median (50th 

50 

percentile), i.e., p = 0.50: R = — x 8 = 4 and the 4th number 
in the example is 8. 

In an ordered data set, i.e., when the rank is known, the 
percentile is conveniently calculated from Equation (58), i.e., 
by interpolation. 

The 25th percentile (p(0.25)) is called the lower quartile, and 
the 75th percentile (p(0.75)) is called the upper quartile. 

Interquartile interval (IQR)'. 

p(0.75) - p(0.25) (59) 

The interquartile interval includes 50 % of the data, which is 
also known as the central 50 % of the distribution. Since a 
includes about 34 % of the distribution the interquartile inter¬ 
val, it corresponds to 0.5 x 34/25 x IQR or 0.68 x IQR. 


Example 

Calculate the interquartile interval using EXCEL: 
PERCENTILE!interval,0.75) - PERCENTILE(interval,0.25). 


Quantiles 

Quantiles are points taken at regular intervals from the 
cumulative distribution function of a random variable. Divid¬ 
ing ordered data into q essentially equal-sized data subsets cre¬ 
ates (/-quantiles; the quantiles are the quantity values that 
mark the boundaries between consecutive subsets. Put 
another way, the kth (/-quantile is the value x such that the 
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probability that a random variable will be less than x is at most 
k/q and the probability that a random variable will be more 
than or equal to x is at least (q — k)/q. 

There are (q — 1) quantiles, with k as an integer satisfying 
0 <k<q. 

Yet another way to describe the quantile is as a number x p 
such that a proportion p of the population values are less than 
or equal to x p . The quantiles are thus values which divide the 
distribution so that there is a given proportion of the observa¬ 
tions below the quantile. For example, the median is a quantile, 
the 50th percentile. 

Thus, the 0.25 quantile (also referred to as the 25th percen¬ 
tile) of a variable is a value ( x p ) such that 25 % (p) of the values 
of the variable fall below or are equal to that value. 

Quantiles are used to investigate the normality of a dis¬ 
tribution or to compare the type of frequency function 
between two distributions. The quantiles of an unknown 
distribution are calculated and compared with those of a 
known (normal) distribution in a regression analysis. If 
the distribution of the data coincides with the assumed 
distribution, the data will follow a linear regression line. This 
is recognized as a Q-Q (Quantile-Quantile) plot and gives an 
overall impression of one distribution in relation to another 
(Figure 5). 

Quantiles are also used to estimate an empirical cumulative 
frequency plot, i.e., often called a mountain plot. The theory 
and estimation of mountain plots will be further discussed 
in the section on “Comparison of Methods.” 

More elaborate tests for normality are the Kolgomorov- 
Smirnov and Anderson-Darling tests which quantitate the 
divergence from normality. 


Probit 

The probit (probability unit) function is the inverse cum¬ 
ulative distribution function and the quantile function of the 
normal distribution. The probit function thus generates a 
value of a random variable, associated with a specified cumu¬ 
lative probability. The probit only exists for values above 0 and 
less than 1. 
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FIGURE 5 Quantile plots. In the left panel, the quantiles of two data sets 
are compared and the relatively linear regression indicates that the fre¬ 
quency distributions are similar but there is a bias. In the middle panel, 
the quantiles of one of the methods (high right) and those of the difference 
between the results (low left) are compared with a normal frequency distri¬ 
bution. Their close proximity to the equal line indicates that they are close to 
normal. In the right panel, the distribution of the data is shown as empirical 
cumulative plots “mountain plots” of the differences and one of the methods. 
At least the mountain plot of the differences is almost symmetrical, whereas 
the mountain plot of the results of method 1 shows a left skewness which is 
indicated also in the Q-Q plot. The vertical lines indicate the medians and 
delineate central 95 percentiles, respectively. 


In EXCEL, the function is calculated by NORMSINV 
(probability) and can be used to simulate the sigmoid curve 
for probabilities between 0 and 1 (see Figure 3). 


Logit 

logit(p) = In 


= ln(odds) 


(60) 


where p is the probability. 

The logit function is 

In ( —-—] = In (p) — ln(l — p) = b + a x X 


1 — p 


(61) 


The ratio between two odds is recognized as the odds ratio 
(R) (223). This is calculated as 

Ei 


R = 


1-Pi 

P2 

1 ~P2 


(62) 
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Thus, 


HR) 


In 


1-Pl 

P2 



logit(pi) - logit(p 2 ) 



(63) 


Random Numbers 

Random numbers can be obtained from tables or number 
generators (e.g., www.random.org) but are also available in 
EXCEL Thus, RANDO will return one single random number. 
NORMSINV(RANDO) in an array will generate a set of nor¬ 
mally distributed random numbers with an average of 0 and 
standard deviation of 1. The formula 

$x + $s(x) x NORMSINV(RAND ()) 

where x is the mean and six) is the standard deviation, will 
modify the distribution of the random numbers accordingly. 

An single value NORMINV(RAND() r MEAN,SD) copied to a 
set of cells will also generate a set of normally distributed 
values with the given mean and standard deviation. The 
program generates a new set of random number on all calcu¬ 
lations, press F9. 

Trimmed Means 

To evaluate the effect of outliers (however defined), 
trimmed and winsorized means may be used and are often 
referred to as robust location estimates. Trimmed and winsor¬ 
ized means should be used with care if the distribution is not 
symmetrical. The resulting distributions are not always Gauss¬ 
ian. On an average, trimming will underestimate the true 
dispersion. 

Mean, trimmed: 

_ *(fc+l) + x (k+ 2) +-b *(«-*:) _ 1 ^n-k 

n — (2 x k) n - (lx k ) ^i=k+i 1 


1 This procedure is named after the statistician Charles P. Winsor (1895-1951). 
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In the data set, the k lowest and k highest values are 
deleted. The arithmetic mean of the remaining data is the 
trimmed mean. 

Example 

See below. 

Mean, winsorized: The winsorized mean resembles the 
trimmed mean, but rather than discarding the highest and 
the lowest numbers, the removed numbers are replaced with 
the next higher and next lower number, respectively: 

{k + 1) X X(fc +1 ) + X(fc+2) +-h X(n-Jt-l) + {k + 1) x x (n-k) 

W k = - 

n 

= k x x (k+ 1) + + k x x ( n ~ k ) (65) 

n 

In a perfect Gaussian distribution, the trimmed and the win¬ 
sorized means would remain unchanged but the standard 
deviation (dispersion) reduced. 

Example 

The results 802, 854, 823, 790, 815, 840, 833, 809, 843, 821 
(mean 823.1; s = 19.8) are rearranged in increasing ordered: 
790, 802, 809, 815, 821, 823, 833, 840, 843, 854. 

To calculate the trimmed mean, delete 790 and 854: 

802 + 809 + 815 + 821 + 823 + 833 + 840 + 843 

= -8- 

= 823.25; s = 14.6 

To calculate the winsorized mean, delete 790 and add 802 
and delete 854 and add 843: 

802+802+809+815+821+823+833+840+843+843 
w k =- 

10 

= 823.10; s = 16.1. 

Dispersion of Data 

There are different measures of the dispersion, in addition 
to the standard deviation (variance); the median absolute devi¬ 
ation (MAD), the mean (average) absolute deviation (AAD), 
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and the mean squared error (MSE) are the most often used. All 
are robust measures of dispersion and measures of the central 
tendency. 

MAD, 

MAD = median,).!,; — median,/!.!',/) (66) 

i.e., the median of the observations' differences from their 
median. 


Example 

790, 802, 809, 815, 821, 823, 833, 840, 843, 854; median = 822. 
Calculate the absolute difference from the median: 

32, 20, 13, 7, 1, 1, 11, 18, 21, 32 and rearrange in ascending 
order: 

1, 1, 7, 11, 13, 18, 21, 32, 32 and calculate the median of the 
differences from the median: MAD = (13 +18)/2 = 15.5. 

If 


(suspect number — median) 
MAD 


> 5 


(67) 


then the number is often regarded as an outlier. 

Standard deviation based on MAD 

1.4826 x MAD (68) 

provided the data are Gaussian distributed. 

Multi-pie of Median, MOM : 

MOM = —^— (69) 

median 


MOM thus expresses the result as a fraction of the median 
and is a measure of how far an individual test result deviates 
from the median (cf. z-score (51)). 

It is commonly used for instance in reporting the results of 
medical screening tests. 

Mean absolute deviation (average deviation or mean absolute 
deviation) 


AAD = 



(70) 


n 
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• The AAD from the above example is 15.6. 

• The EXCEL function AVEDEV(xi:x n ) calculates the AAD. 

• The AAD must not be confused with MAD. 

• The relation between these measures is MAD < AAD < s. 


Further 


AAD [2 
— =\ln 


0.7979 


(71) 


provided the s is calculated from data that are normally 
distributed. 

The MAD, AAD, and s all have the same unit as the original 
data. 


Mean square error 


MSE 



*o)~ 


n 


(72) 


where x 0 is a true or a predetermined value according to a 
model. Compare RMS (27). 


Uncertainty 

The uncertainty concept is understood as an interval within 
which the true, or reference, value is supposed to be found 
with a defined level of confidence. Any bias is assumed to have 
been eliminated or corrected and substituted by an uncertainty 
that should be included in the uncertainty budget. This proce¬ 
dure to eliminate the bias will increase the uncertainty of the 
result but not necessarily change its value. An advantage 
with the uncertainty concept is that the laboratory takes the 
responsibility for containing a bias which the user normally 
cannot do. 


New Chapter Measurement Uncertainty 

The general formula for estimating error propagation 
is based on the partial derivatives (76) of the mathematical 
function (measurement function) to calculate the result. There 
are several shortcuts, and a frequently used method that is 
applicable to spreadsheet programs is that of Kragten (ref. 
see Eurachem Guide CG4 p. 104). The Kragten approximation 
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requires that the variables are uncorrelated and it cannot han¬ 
dle complex formulas, e.g., exponentials. When the same 
quantity occurs more than once in a measurement function, 
the Kragten approximation may overestimate the uncertainty 
due to so-called compensating errors. 

Propagation Rules 

The combined uncertainty u c of additions and subtractions 

X = X\ ± Xi ± • • • ± %i (73) 

is 

u c (X) = ±\j u(x i) + u(x 2 ) + • • • + u{xi) (74) 


Example 

The variables A = 10 and £> = 21 have the standard un¬ 
certainties of 11 (A) = 0.3 and u(B) = 0.6; the sum a combined 

uncertainty w c (31) = ±\/0.3 2 + 0.6 2 = 0.67. 

The combined uncertainty u c of multiplications and 
divisions 

X = x t x (=)x 2 x (=)••• x (-r)x,- (75) 


is 


u c (X) 

X 



u{x 2 ) V | | (u(xi )\ 2 


(76) 


Example 

The ratio between the variables is 10/21 = 0.48. Estimate the 
combined relative and absolute uncertainty of the ratio! 



and the absolute uncertainty thus amounts to u c (0. 48) = 0.019. 
If instead the numbers were multiplied, the relative uncer¬ 
tainty would be the same but the absolute different since the 
product is 210. Thus u c (210) = 8.7. 
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In many cases, the uncertainties or relative uncertainties, as 
appropriate, may be added linearly. This always leads to an 
overestimation of the combined uncertainty, the magnitude 
of which depends on the relation between the uncertainties 
of the terms and factors. In the examples above, the estima¬ 
tions would be 0.90 instead of 0.67 and 0.058 instead of 
0.041, respectively. 

A more complete method to estimate the uncertainty of a 
function q(x, ..., z) 


u M = /( i "«) 2 + "'+( §“( 2 >) ( 77 > 

In case the variables (e.g., x,.. ., z) are not independent, then 
the covariance between all the variables must be taken into 
consideration 

u c (q) 

=/(S mW ) + ■ ■ ■ + (! h(z) ) 2+2x S x • ■ ■ x S x c ° va ” ( 7§ ) 

For estimation and definition of the covariance, see Equa¬ 
tions (177)-(179). 

The result of an uncertainty analysis is summarized in an 
uncertainty budget, resulting in a combined uncertainty (u c ). 
The level of confidence may be adjusted by multiplying the 
combined uncertainty by a coverage factor k, to obtain the 
expanded uncertainty(Lf): 

U(X) = kx u c (X) (79) 

The value of k shall always be attached to an expanded 
uncertainty. A fc-value of 2 is generally accepted for a 
level of confidence of 95 %. The level of confidence does not 
have the same stringency as the “Cl” which also considers 
the type of distribution (e.g., normal or Student's distribution). 

Note The relative uncertainty is always calculated from the 
combined standard uncertainty, not the expanded uncertainty 
(cf. the Note to Equation 32). 
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Reference Change Value and Minimal Difference 

As an example of error propagation, consider estimating the 
least significant difference between two results, in clinical 
chemistry known as “minimum difference,” MD. 

Example 

If the results of two measurements are A and B with the 
uncertainties u(A) and u(B) a significant difference (D) must 
be larger than the uncertainty of the difference u(D). 
Therefore, 



cf.(74) 


(80) 


Example 

The desirable level of confidence is usually chosen to about 
95 %, i.e., a /c-value of 2 or 1.96 for a normally distributed data 
set. It may be reasonable to assume that u(A) =u(B) and thus 
the MD is 

MD > 1.96 x u(A) x Vl = 2.77 x ii(A) 

As a rule of thumb, the MD is often accepted as MD = 3 x u(A). 
In clinical chemistry, the “reference change value,” RCV, 
also includes the biological variation. The variation is usually 
expressed in relative terms (%CV(/!)): 


RCV > \Jl x %CV(A„) 2 + 2 x %CV(A U ,) 2 


\j %CV(A fl ) 2 + %CV(7U 2 x Vl 


(81) 


where %CV(A„) is the coefficient of variation of the measure¬ 
ment procedure and %CV(A W ) is the biological within indi¬ 
vidual variation. 

Index of Individuality 

The usefulness of reference values in diagnosis is often 
expressed as the index of individuality (II): 


\/%CV(A w ) 2 + %CV(A) 2 ^ %CV(A W ) 
%CV(A fc ) ~ %CV(Afe) 


(82) 


where %CV(A b ) is the between individual variation. 
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If low values (II < 0.6) are found the utility of reference 
values is usually limited in diagnosis. When II > = 1.4, the dis¬ 
tribution of results for a single individual will cover a major 
part of the population reference interval and thus be of signif¬ 
icant importance in diagnosis. A low II does not exclude the 
use of a quantity in monitoring a disease or condition in a spe¬ 
cific individual. 

Example 

The II for S-Creatinine concentration is reported to about 0.3 
and for S-Iron concentration about 1.1. 

Type B Estimates of Uncertainty 

The uncertainty of a measurement may be calculated by sta¬ 
tistical means (Type A) or estimated by other methods, e.g., lit¬ 
erature, experience (Type B). The rectangular and triangular 
distributions are frequently used in Type B estimates of the 
uncertainty. Estimates by Type A and Type B are treated 
equally in an uncertainty budget. 

Standard uncertainty of a rectangidar distribution: 

If all results are distributed within an interval (2a) between 
an upper and a lower limit and the probability for a specific 
value is the same in the entire interval, then the distribution 
is known as “rectangular” or “uniform.” This can also be 
expressed as that extreme values, close to the upper or lower 
limit, of the distribution will be as probable as anywhere 
within the distribution. No values are however expected or 
even possible outside the assumed interval. 

The uncertainty estimated for a rectangular distribution is 
the most conservative, i.e., gives the largest standard 
uncertainty: 

u(X) = (83) 

where 2 a is Upper Limit — Lower Limit. 

Example 

Consider the length of a rod placed far above you. Assume 
that it is no less than 50 cm and no longer than 150 cm and 
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estimate the length and uncertainty! This means the best 
estimate is 100 cm, i.e., the middle of the interval and the 
standard uncertainty: 

150 — 50 

w(rod) =-— = ±28.9 cm 

2 x V3 


Note An expanded uncertainty (k = 2) will reach outside 
the assumed limits which always is the case since that 
will cover the probability that all observations are in the 
interval. 

Standard uncertainty of a triangular distribution: 

If a value is more likely than other values within an interval 
and no values expected or possible outside the interval, 
then a triangular distribution may be proper. Extreme 
values close to the limits of the assumed interval are possible 
but less likely than elsewhere in a symmetrical triangular 
distribution: 

«(X) = ±-^ (84) 

The triangular distribution is attractive because of its sim¬ 
plicity. It is characterized by a lower limit and an upper limit 
(LL = b and UL = c) and a mode d. The mean of the distribution 
is then 


b ± c ± d 


The variance is 


(85) 


<7 


2 


(b - c ) 2 + (b- df + (c - df _ 

36 

b 2 + c 2 + d 2 — bxc~bxd — cxd 


18 


( 86 ) 


In the case of a symmetrical triangular distribution, i.e., 

b~\~c 

d = — 2 ~, the square root of Equation (86) is equal to Equa¬ 
tion (84), whereas in a “right-angle triangular distribution,” 
i.e., b = d it is simplified to 
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a 


2 


(b ~ c) 2 

18 


(2a) 2 

18 


(87) 


The square root of the variance is the standard uncertainty: 


(b — c ) 2 a 

~ 3 x y/2 ~ 3 x y/2 ’ 


( 88 ) 


Since the expected value cannot be <b or >c, the uncertainty 
is “one-sided,” i.e., b — u or c+u. 

Standard uncertainty from an Gaussian distribution 
When the measurement value is most likely to be near the 
center of an interval but there is a small, but real, possibility 
that there might be values outside the assumed or observed 
limits, then the appropriate “density function,” i.e., distribu¬ 
tion, is often assumed to be Gaussian. The standard uncer¬ 
tainty is estimated by 

..(X) = (89) 


where 2 xa is Upper Limit —Lower Limit. 


Chi-Square (x 2 ), an Index of Dispersion 

Chi-square is the sum of the differences between found and 
expected number of observations squared, divided by the 
expected number: 

(90 

“l “l! * i 

where n is the number of “classes.”/) and F, are sample counts 
of individuals (discrete quantities) which do and do not 
possess the property investigated and the corresponding 
hypothetical or expected frequencies being F 1 and F 2 . 

Chi-squared tests can only be used on actual numbers and 
not on percentages, proportions, means, etc. The df is the num¬ 
ber of classes minus 1, i.e., n — 1. 

The expected number ( F) is 

F 
n 


F = n x p; p 


(91) 
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Example 

The probability of a particular event is 0.28. In an experiment 
of 35 observations, 16 events occurred and in 19 cases it did not. 
The expected occurrence is 0.28 x 35 = 9.8, and the not expected 
thus is (1 — 0.28) x 35 = 25.2 cases. Is this what is expected? 

? _ v~~ vn ifj - Fjf _ (16 ~ 9.8) 2 (19 - 25.2) 2 _ 

1 ^'=i Fi 9.8 + 25.2 

3.922 + 1.525 = 5.4 

The df in this case is 1 and a table for chi-squared cumulative 
probabilities provides the critical / 2 -values 3.84 (p = 0.05) and 
6.63 (p = 0.01). 

The null hypothesis is rejected since the calculated y 2 value 
is larger than the table value, i.e., there is a significant differ¬ 
ence from what would be expected (p< 0.05). 

This can be summarized in "able 5. 

The sum of the (O-E) is always zero; accordingly if only two 
classes the (O-E) 2 will be equal for each class. 

The EXCEL-function to evaluate y 2 is CHIINV(1 — a.,1). 

Thus, in this case CHIINV((1 - 0.95),1) = 3.84. 

The df for a contingency table equals (number of columns 
minus one) times (number of rows minus one) not counting 
the totals for rows or columns. For the 2x2 contingency table 
(see Tables 6 and 14), this gives (2 — 1) x (2 — 1) = 1. 


Example 

The diagnostic sensitivity and specificity (see section on 
“Bayes' Theorem”) of a particular quantity were 0.74 and 0.86, 
respectively. The prevalence of disease was 10 %. The 
performance of another biomarker was tested on the same 


TABLE 5 Goodness of Fit 



Observed (O) 

Expected (E) 

(O 

E) 

(O E) 2 

(0-E) 2 /E 


16 

9.8 

6.2 


38.44 

3.9 


19 

25.2 

-6.2 


38.44 

1.5 

Total 

35 

35 

0 



5.4 
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TABLE 6 The Chi-Square is also Used in Evaluating Contingency Tables 

Variable 2 

Data Type 1 Data Type 2 

Totals 

Category 1 

Category 2 

Total 

a b 

c d 

a+c b+d 

a + b 

c T d 

a+b+c+d=N 


population, and the sensitivity and specificity were 0.70 and 0.77, 
respectively. Evaluate any difference between the performances. 

The true positives (TP), the false negatives (FN), the true 
negatives (TN), and false negatives (FN) were calculated 
and the outcome summarized in two 2x2 tables. 



Quant A (Expected) 

Quant B (Found) 

“Positive” 

“Negative” 

Total 

“Positive” 

“Negative” 

Total 

Diseased 

37 

13 

50 

35 

15 

50 

Non¬ 

diseased 

63 

387 

450 

105 

345 

450 

Sum 

too 

400 

500 

140 

360 

500 


Since the comparison was made using the same patient 
group, the number of diseased and nondiseased is the same 
in both trials. If the markers performed equally, the number 
in each cell would be the same and we can formulate the y 2 : 


X 


2 


(35 - 37) 2 (15 - 13) 2 (105 - 90) 2 (345 - 360) 2 

37 + 13 + 90 + 360 

3.54; df = 1; 


The critical y 2 -value is 3.84 (p = 0.05) CHIINV(0.05,1 ), and 
therefore, the null hypothesis is accepted and it is not likely 
that there is a difference between the methods. 

There is a short-cut method for estimating the y 2 -value from 
a 2 x 2 table: 


2 (a x d — b x c) 2 x N 

/ (a + c) x (a + b) x (b + d) x (c + d) 


(92) 


Applying this formula to our example yields a y 2 -value of 
3.47. The difference can be explained by rounding errors: 

p = 0.06 CH1DIST{3.47,1) 
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The y 2 test for small numbers of observations can be 
improved by Yates' continuity correction, which gives smaller 
values of y 2 : 


1 


2 


^n (\fi-Fi I - 0.5) 2 _ 
2^1=1 F, 


(93) 


Example A 


^ _ (35 - 37 - 0.5) 2 + (15-13-0.5) 2 + (105 - 90 - 0.5) 2 


37 


13 


90 


+ 


(| 345 — 360| — 0.5) 2 
360 


= 3.25; df= 1 


The corrected value is smaller than the uncorrected, but the 
difference is usually small enough not to change the conclu¬ 
sion. In practice, the Yates' correction has little influence on 
the outcome unless the total number of observations is less 
than 40. 

Note that it is important how the expected value is calculated. 

The y 2 evaluation of a 2 x 2 table is an approximation, and 
the exact value can be calculated using Fisher's exact test, 
which is based on factorials: 



( a + b ) !x (c + d) !x (a + c) !x (b + d )! 
a\x b\x dx d\x N\ 


Example B 

A classic example of the use of 2 x 2 tables is the evaluation 
of treating patients. Assume a study of treating hyperlipid¬ 
emia with drug A and drug B. To estimate the expected num¬ 
ber in each cell, it is assumed that there would be no difference 
between the treatments and the distribution between the cells 
equal to the distribution between the treated groups. 
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The expected number of hyperlipemic individuals treated 
by drug A (x): 
x 240 150 x 240 

150“ 500’ 500 ’ A = 72 

Similarly, the expected number treated by drug B ( y ) would 
be 

y 260 150 x 260 

150 _ 500’ V ~ 500 ’ V ~ 


Those with no or less hyperlipidemia by the drugs A (z) and 
B (v): 


z _ 240 _ 350 x 240 

350 _ 500 ’ 500 

v = 182 


168 and 


v _ 260 
350 _ 500 ’ 



Hyperlipemia (Exp.) 

Hyperlipemia (Obs.) 

“Positive” 

“Negative” 

Total 

“Positive” 

“Negative” 

Total 

Drug A 

72 

168 

240 

110 

130 

240 

Drug B 

78 

182 

260 

40 

220 

260 

Sum 

150 

350 

500 

150 

350 

500 


The rows and columns add up to the same number and the 
y 2 calculated: 

..2 _ ( no - 72 ) 2 , (40 - 78) 2 | (130 - 168) 2 | (220 - 182) 2 _ 

Z ~ 72 + 78 + 168 + 182 “ 

55.1; df= 1 

Since the y 2 value is far above the critical value (3.84), the 
null hypothesis is discarded and the drug B more effective 
than drug A. 

The contingency table can be expanded to having r rows and 
c columns: 

= df=(r _ 1)x (c _ (94) 
17 i 

„ column total x row total 


overall total 


( 95 ) 
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TABLE 7 Contingency Table for Several Categories 



Category I 

Category II 

Category III 

Sum 

Sample A 

a 

b 

c 

a + b+c 

Sample B 

d 

e 

f 

d + e+f 

Sample C 

8 

h 

i 

g+h + i 

Sum 

a + d+g 

b+e+h 

c+f+i 

N 


where fi is the found and F, the expected number in the corre¬ 
sponding cells, i.e., for cell a, the expected value would be 
(a + b + c) x ( a + d+g)/N ; for cell b, ( a + b + c ) x ( b + e + h)/N ; etc. 

The results are inserted into Table 7 and the y 2 calculated. 
The df would be (3 — 1) x(3 — 1) = 4. This expansion is not fur¬ 
ther discussed in the present text. 


Chi-Square (x ), in Comparisons 


The imprecision of a measurement method can be com¬ 
pared with the specification for the method: 


1 


2 

c 


(n — 1) x 



(96) 


where s 0 is the specified (nominal) standard deviation and s 
the standard deviation found for the measurement procedure. 

The calculated value y 2 is compared to the table value at the 
appropriate degrees of freedom: 


7 C rit 


7c(a;(n—1)) 

n — 1 


(97) 


where n is the number of observations and (n — 1) is the 
degrees of freedom. If y 2 > y 2 ni , the null hypothesis hO: s<s 0 
is rejected. 


Example 

In a verification procedure, the standard deviation of 10 
repeated measurements was 0.25 mmol/L. The manufacturer 
claimed an uncertainty of 0.20 mmol/L. Is the found standard 
deviation reasonable at a level of confidence or a = 0.05? 
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2 


X 2 C = (n - 1) x 



2 


= 14.1; the Zcrit(o,05,9) = 169 


Since the calculated value does not exceed the critical table 
value, the claim is not rejected. If, however, the standard 
deviation was based on 30 repeated measurements, the 
Xc and Zo-it( 0 , 05 ,i 4 ) equal 45.3 and 42.6, respectively, and the 
claim would be rejected. The rational is that the standard 
deviation would have been estimated with a considerably 
smaller Cl. 


In EXCEL, the Zc(a;(«-i)) is calculated as CHIINV(a.,df). 

The Rule of Three 


This rule states that if no event happens in n observations, 
then it can be assumed, with a probability of 95 %, that it will 
happen less frequently than 1 in n/3. 

This rule is an acceptable approximation if n > 30. 

Example 

If in a series of measurements there is no outlier in 600 con¬ 
secutive measurements, it can be concluded with a 95 % con¬ 
fidence that there will be less than 1 outlier in 200 
measurements, i.e., less than 0.5 %. 


ANALYSIS OF VARIANCE 


Definitions and Calculation 

The one-way ANOVA was originally designed to allow 
comparison of several means of data sets rather than using a 
series of Student's independent f-tests (123) for all possible 
pairs. The procedure will evaluate if there is a difference 
between the means of the studied groups. The ANOVA calcu¬ 
lates the sum of squares within the groups and between the 
groups. The significance of a difference is established using 
F-statistics (132). Solution of an ANOVA experiment is offered 
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Between groups 

SS/, 

dfb 

MS/, F =MS/,/MS,,, 

p -Value 

Within groups 

ss,„ 

df w 

MS,,, 


Total 

ss tot 

dft 



SS is the “sum of squares”; df, degrees of freedom; and MS, “mean square.” The F-value indicates 
the significance of the difference between groups in the study. 

TABLE 9 Notations Used in 

Describing an ANOVA 



Group 1 


Group 2 

Group k 

Sample 1 



*21 

*ki 

Sample 2 

*12 


x 22 

Xk2 

Sample n 

*1 n 


X 2n 

Xkn 


by all standard statistical packages and built into many 
spreadsheet programs. EXCEL supports one-way ANOVA, 
but this requires that the “Data analysis Add-in” has been 
installed. It is a part of the standard program but needs activa¬ 
tion. It will then be found under “Data.” The solution is often 
presented in a standardized format (Table 8). 

The experimental design comprises several ( k ) groups, runs, 
or series of results each including several ( n k ) observations 
(. x ), with a total number of N = kxnj ( observations. Groups 
may comprise different numbers of observations (unbalanced 
design). The group means are designated and the grand 
mean x (Table 9). 

The procedure is to calculate the sum of squares (SS) for the 
total, within and between groups. 

Total: 



d/tot = N - 1 (99) 










50 


FORMULAS 


Alternative calculation: 


O 

II 

2 

I 

1—* 

X 

< 

l-t 

X 

(100) 

Between groups: 


'b = n Q X ~ ^) 2 

(101) 


where n 0 is the number of observations in the groups. If this 
varies between groups (an unbalanced design), then Equa¬ 
tion (101) is rearranged to 


SS b = n 0 x ^. =1 (x 


■ xf 


E 


\i=N 


E i=k _ 2 

i=1 m x xf 


(E«* 

N 


i=k 

i =1 
2 


Hi X (Xi — X) 1 ) = 


( 102 ) 


df b = k ~ 1 


(103) 


If the design is unbalanced, use a “weighted” grand mean 


x = 


E 


i=k 

i= 


m x x 7 - 


E i=k 

i=i Hi 


E i=k _ 

i= i n > x 

N 


(104) 


Within groups: 


SS U , = (N — k) x 


(«i - 1) x sf H - h (n k -1) xsj 


Mi H- \-n k — k 




(105) 


Compare the calculation of SS„, with the that of the pooled 
standard deviation (33)! 

If the groups comprise the same number of observations 
(balanced), then 


SS » = ELl ((”*' “ ^ X S ^) = Ei=1 ( X ?)“Ei=! n ' x 


df w = N-k 


(106) 


SS to t = SSfc + SS a , (107) 

Thus, it may be practical to use the simple formula for SS tot 
(100) and subtract either SS a , or SS/, to calculate the SS/, and SS, t „ 
respectively. 
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ANOVA to Evaluate Differences Between Means 

To evaluate if there is a difference between the means of sev¬ 
eral groups, an F-test is performed. In the ANOVA evaluation, 
the F is calculated as 


The F is evaluated using an F-table. 

In EXCEL, use FINV(prob,dfl,df2) 

Note In estimating the F-value, the MS/, is always in the 
numerator and MS,,, in the denominator. See also 
Equations (131) and (132). 

A high F-value will indicate a significant difference between 
groups but not between which groups. There are different tech¬ 
niques to estimate the significance level of differences between 
groups. 

A simple approach is to arrange the means in increasing 
or decreasing order and calculate the “least significant 
difference”: 



(109) 


where s wit hin is the estimated within group standard deviation 
(v / MS w i t hm)/ n is the number of observations in each group 
(balanced design), f(n-2) the f-value for the indicated degrees 
of freedom. 

Comparison of this value with the difference between 
the ordered means will indicate where a significant difference 
may be found. This formula is derived from Equation (125) 
(independent Student's f), assuming that the standard devia¬ 
tion is the same for the groups and the difference between 
the means is the difference that is tested for significance. This 
is directly seen by rearranging the formula to 


f(l—a), (n— 2) 



within 

n 


( 110 ) 
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A formula applicable in unbalance designs with different 
standard deviations would then be 


d t(l—a),(n—2) X 



and 


( 111 ) 


Remember though that the standard deviation of the groups 
needs to be similar and the degrees of freedom may need to 
be calculated using the Satterthwaite's approach (see 
Equation 128). 

There are more sophisticated and rigorous solutions to this 
problem. It is advised not to use the Student's f-test repeatedly 
to avoid the risk false significances. 

Example 

The S-Cholesterol concentration was measured in 4 groups 
of 10 participants on different diets: 



1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

Mean 

Var 

1 

5.2 

5.6 

6.8 

3.5 

5.9 

6.3 

7.2 

8.5 

6.8 

5.7 

6.15 

1.77 

2 

6.2 

6.2 

7.9 

8.2 

5.7 

6.6 

8.1 

9 

12.1 

6.7 

7.67 

3.57 

3 

4.2 

4.9 

6.8 

3.7 

4.5 

5.6 

6.4 

5.8 

6 

7 

5.49 

1.26 

4 

4.8 

7.1 

5.9 

4.7 

5.8 

4.9 

5.6 

7.5 

5.1 

5.8 

5.72 

0.89 


“Grand mean”: 6.26; total variance: 2.47: 

SS tot : (40 -1) x 2.47 = 96.50; df= 40 -1 = 39. 

SS b : 10 x (6.15 - 6.26) 2 + (7.67 - 6.26) 2 + (5.49 - 6.26) 2 
+(5.72 - 6.26) 2 = 28.85; df- 4-1 = 3. 

SS a , = (10 -1) x (1.77 + 3.57 +1.26 + 0.89) = 67.65; 
d/=40 —1 — 3 = 36. 

F = ir*ir = 5 ' 12 

SS W df b 

The result indicates that there is significance between the 
groups. 

It is convenient but not necessary to sort the calculated 
means in ascending order: 5.49, 5.72, 6.15, and 7.67. 

Since clearly the variances (standard deviations) of the 
groups are different formula (111) is applicable. Reviewing 
groups 2 and 4: 
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3 37 0 89 

Asign = fo.05,18 x = 2.10 X 0.668 = 1.40 

A cross-table to display the differences between the means: 



5.49 

5.72 

6.15 

7.67 

5.49 

0 

0.23 

0.66 

2.18 

5.72 


0 

0.43 

1.95 

6.15 



0 

1.52 

7.67 




0 


The number of unique entries in a cross-table is ” x ^ +1 ^ , 
where n is the number of observations. 

Thus, there are significant differences between the highest 
(group 2) and the remaining three groups between which there 
is no significant difference. 

The data set analyzed by the ANOVA procedure in EXCEL 
generates the table: 



SS 

df 

MS 

F 

p- Value 

F-Crit 

Between groups 

28.85 

3 

9.62 

5.12 

0.00 

2.87 

Within groups 

67.65 

36 

1.88 




Total 

96.50 

39 






Nonparametric Methods 

The use of ANOVA assumes that the data are normally dis¬ 
tributed and that the variances of within the groups are of a 
similar magnitude (cf. Student's t- test) within the measuring 
interval. 

The Kruskal-Wallis test is a nonparametric alternative to the 
one-way ANOVA, and the Friedman's test can be compared 
with the two-way ANOVA. In both tests, all the observations 
are ranked together, any ties given the same calculated rank. 
Then sum of the ranks in each method is used to calculate sta¬ 
tistics that can be evaluated by comparing with a y 2 table. Both 
procedures can be applied to ordinal, interval, and rational 
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data. The test will only demonstrate that there is at least one 
group which differs from the rest. Posttests may be required 
to identify significant differences. 

Analysis of Variance Components 

If the same sample is analyzed repeatedly in several series, 
the ANOVA can be used to estimate the within- and between- 
series variance and from them the combined variance. Thus, 
the mean squares (MS) of the ANOVA table are equivalent 
to the between- and within variance (s 2 ) obtained by dividing 
the sum of squares by the corresponding degrees of freedom: 

MS;, = ^ (112) 

a Jb 

MS,, = (113) 

V W 


The within-series variance (MS,,) is equivalent to the pooled 
variance (33) of the runs times the degrees of freedom ( N—k ). 

The between-series variance can also be calculated directly as 
the average number of observations ( n 0 ) in each group times 
the variance of the group means s~ 


s 


2 

8 



(*" 1 ) 


(114) 


MSb = hq x 



(*-l) 


= n 0 xs 2 g 


(115) 


The total (combined) variance can only be estimated after 
compensation for the contribution from the within-series var¬ 
iance of the MS;,: 

“Purified” (or “pure”) between run and intermediary 
precision: 



(116) 


where n 0 is the average number of observations in each group, 
run, or series. 
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However, if the number of observations in the groups 
differs, then a “harmonic” average number of observations 
should be used: 


n 0 


w2 ~E" ri s(".-) 2 

N x (k — 1) ! N 


(117) 


where N is the total number of observations and n, is the num¬ 
ber of observations in each group and k is the number of groups. 
In most cases, the difference between the arithmetic mean of the 
number of observations and Equation (117) is negligible (the 
second term in the second form of (117)) in practical work. 

si is also known as the “unbiased estimate of the between 
group variance.” 

Combined uncertainty: 


s tot 


U C {X) — \J (Sb)" 


+ ms h 


(118) 


As seen from Equation (116), the MS;, cannot be less than the 
MS k „ which would require the square root of a negative num¬ 
ber. If, however, MS/, < MS, X „ then the s tot , by convention, is set 
to x/MSk,, i.e., MS;, = 0. This condition can be formulated as 


s 


2 

b 


= MAX 


MS;, — MS a 
n 0 


(119) 


which is also how it is coded in EXCEL. 

The total variance of several series of values can be calculated 
by different approaches. Consider for instance, the total esti¬ 
mated variance of results from a laboratory with several instru¬ 
ments performing the same measurements. The s tot could be the 
estimated variance from the total data set (y/ VAR(xn : x^ n ), 
where k represents the number of series and n the number of 
observations in the series thus x u :X)- cn reads from the first to 
the last). It can be argued that a representative variance is the 
average of the variance of the series. As above, the s tot can be 
estimated by the ANOVA components. The difference between 
these approaches is minimal if there is none or a very small (less 
than about 1 %) between-series difference. In all other cases, the 
averaged variance underestimates the s tot and that estimated 
from the total data set either over- or underestimates the s tot , 
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TABLE 10 Results of Repeated Measurements 



Series 1 

Series 2 


Series 3 

Series 4 

Series 5 

Result 1 

124 

125 


120 

118 

125 

Result 2 

125 

122 


122 

120 

126 

Result 3 

122 

127 


123 

120 

128 

Result 4 

126 

125 


122 

118 

124 

Result 5 

124 

122 


123 

120 

124 

Mean 

124.2 

124.2 


122.0 

119.2 

125.4 

Standard 

1.5 

2.2 


1.2 

1.1 

1.7 

deviation 







TABLE 11 

Standard ANOVA Table Normally Displayed 


Source of Variation 

SS 

df 

MS 

F 

p-Value 

Between groups 

120.4 

4 

30.1 

12.1 

<0.001 

Within groups 

49.6 

20 

2.48 



Total 


170 

24 





the ANOVA components taken as the gold method. The within- 
series variance plays a minor role. 


Example 

Control material was measured five times in five series 
(Table 10 and 11). Evaluate the difference between the means 
of the series and calculate the within-, between-, and combined 
uncertainties! 



(30.1 - 2.48) 
5 


5.52; s b = 2.4 


2.48; s w = 1.5 

V2A8 + 5.52 = v/8i00 = 2.8 


y/VAR(x n : x kn ) = V7. 08; s(x) = STDEV(xn : x kn ) = 2.7 

Conclusions: there is a significant difference between the 
series (F = 12.1). The major source of the combined uncertainty 
is the between-series variation. 
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Youden Plot 

The Youden plot is a graphical method to estimate and visu¬ 
alize random and systematic errors in measuring identical or 
similar samples of two different concentrations or from two or 
more sites (laboratories or instruments). The original Youden 
plot was designed to identify the random and systematic 
errors in many laboratories, e.g., in evaluating EQA or PT 
(External Quality Assessment, Proficiency Testing) experi¬ 
ments. In these schemes, the same samples of two or more con¬ 
centrations are distributed simultaneously over time, to 
several participating laboratories. 

The principle of the Youden plot can also be used in one lab¬ 
oratory where several samples are measured in duplicates or 
by different measurement procedures. 

Samples are measured either in duplicates (X, and Y,-, where 
i identifies the sample) or the same samples in different labo¬ 
ratories or by different procedures (where i then identifies the 
laboratory or procedure). 

The results are plotted (Y,- vs. X,) in a two-dimensional scat- 
tergram with the same scale of the axes. Horizontal and verti¬ 
cal lines through the median (Manhattan median) of the 
results creates four quadrants which are crossed by a diagonal 
(slope 1) Y=X+a, through the median (Figure 6). 

Results with agreeing results will be found in the quadrants 
of the diagonal and thus correspond to the “true-positive” and 
“true-negative” results as discussed in the section on “Bayes' 
Theorem.” Results in the remaining quadrants represent the 
“false-negative” and the “false-positive” results. 

A systematic error will make the X,/Y,- move away from the 
median along the diagonal, and random errors will move the 
coordinates away from the diagonal into the “false quadrants.” 
The distribution of the points in the four quadrants will give a 
fair overview of the random and systematic errors encoun¬ 
tered. To assist the evaluation, a circle or a rectangle/quadrate 
is often displayed around the median (Figure 6). 

Results from input to a Youden plot can be used for variance 
analysis as a simplified alternative to a two-way ANOVA. If 
the same or similar samples are measured using two proce¬ 
dures (or in two laboratories), their difference can reasonably 
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FIGURE 6 Youden plot. Solid vertical and horizontal lines delineate quad¬ 
rants with false-negative, true-positive, false-positive, and true-negative 
relations between results. Solid 45° line is the diagonal, the dotted line, 
the equal line and the hatched line the regression line. The rectangle illus¬ 
trates the acceptable random and systematic deviations. 


be assumed to cancel a systematic error. D = (X,- + e) — (Y,- + e). 
The distribution of the differences will therefore be an estimate 
of the repeatability (r) standard deviation s r . 


2 = Sf=i( Pf ~ D ) 2 = Var(P,-) 

2 x (n - 1) 2 


( 120 ) 


The sum of the results (S) gives an estimate of the overall 
variation or reproducibility (R). The overall variance (s R ) 2 is 

\2 E"=i( S '- g ) 2 _Var (Si) 


(s R f = 


( 121 ) 


2 x (n — 1) 2 

The spread due to interindividual variance, ( s L ) 2 , will then be 
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TABLE 12 The Concentration of 10 Samples, Measured with 2 Instruments 



1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

Mean 

S 

1 

5.2 

5.6 

6.8 

3.5 

5.9 

6.3 

7.2 

8.0 

6.8 

5.7 

6.10 

1.24 

2 

4.8 

7.1 

6.3 

4.7 

5.8 

4.9 

5.6 

7.5 

5.1 

5.8 

5.76 

0.96 


Overall mean: 5.93. 


(s R ) 2 = 2 x (s L ) 2 + (s r ) 2 ; (s L ) 2 = (Sr)2 2 (Sr)2 (122) 

A factor 2 in the denominator is necessary because D and S 
are estimated in two sets of results. 

These calculations should not be confused with the Dahl- 
berg approach (36) to estimate the standard deviation of a 
set of duplicate measurements of different samples (Table 12). 


Exampl 

e 


(Srf = 

Var(Df) 

2 

= 0.59; 

(sr ) 2 = 

Var(S;) 

2 

= 0.67; 

(sr ) 2 = 

2 x (s L ) 

2 + (Sr) 2 ; 

= 

0.04, 

s L = 0.20 


s r = 0.77 
s R = 0.82 

<sl ) 2 = (Sk)2 : <Sr) 


0.67 - 0.59 
2 


The relative overall (repeatability) standard deviation: 


%CV 


100x0,82 
“ 5.93 “ 


13.79. 


DIFFERENCE BETWEEN RESULTS; 
STUDENT’S t-TESTS 


Difference—Two Scenarios 

Two different scenarios can be identified in the estimation of 
the significance of a difference between results: 

(1) the difference between the means of two different data sets 
(e.g., the difference in the concentration of an analyte in 
men and women) and 
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(2) the difference between paired results (e.g., the individual 
effect of a treatment). 


The difference between the estimated mean and the true 
value /.i is expressed in relation to the standard error of the 
mean (SEM): 


1*1 


x — n 



(. X - /() x y/n 

s x 


(123) 


Note the similarity to the z-score (51) in which a difference is 
expressed in standard deviations rather than SEM. 

Difference Between Paired Results ; Students t dep 

If a quantity is measured in the same sample or individual 
before and after an intervention and n pairs measured, and the 
difference between the results is d ir then 


l*depl — _Srf_ 

yfn 

where d is the average of the differences between the pairs and 
Sd its standard deviation. 

Degrees of freedom for t dcp : 

df = n- 1 (125) 

EXCEL supports the f dep ; the routine requires that an “Add¬ 
in” is installed. It will then be found under “Data” and is called 
“paired two sample for mean”. 

Since the Student's t dcp -tcst is based on an estimated mean 
and standard deviation, it assumes a Gaussian distribution of 
the differences between the observations. The distribution of 
the data set itself is not important. If the difference is not 
Gaussian distributed, the nonparametric method of choice is 
Wilcoxon signed-rank test (see below). 

A y 2 -test (chi-square), “sign test,” with or without correction 
for continuity for small samples may also be used. 


E:x e:x 


Sd 


Sd 


E"X 

Sd X Vn 


(124) 


n 


Difference Between Means; Student’s t in j: 

A comparison between two means, when the individual 
observations are independent, is known as Student's t ind . 
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The f-value as defined in Equation (123) is also known as the 
single sample f-test since it assumes a comparison with /; 
which is assumed to have no variance. The degrees of freedom 
in the single sample f-test will be n 1 — 1. 

If, however, the comparison is between two means, 
X\ and x 2 , with the standard deviations S] and s 2 , and ?q and 
n 2 observations in the groups, respectively, the f-value is esti¬ 
mated according to Welch: 


| find | — 


X 1 ~ X 2 


X 1 - x 2 


£? , £ 2 . ]/ s ( x l ) 2 + s ( x 2) 2 


(126) 


/-1- 

Hi n 2 


where s(x) is the standard error of the mean, also abbreviated 
SEM. 

If Si ~s 2 is the degrees of freedom for f ind : 


df = Hi + n 2 — 2 
and accordingly 


find 


X\ -x 2 



(127) 


(128) 


If Si t^s 2 , however, use the Welch-Satterthwaite approxima¬ 
tion to estimate the df: 

2 r / 9 \ /?\n2 


df = 


Hi 


- L ) + [^ L 


n 2 


- 1 + 1 ^ 


n\ 


n 2 


(sel+sef) 


2i 2 


Hi 


+ 


H 2 


+ 


4 


sef 


+- 


set 


Hi — 1 n 2 — 1 


h?(hi - 1) n\{n 2 - 1) (hi-1) (h 2 -1) 


(129) 


EXCEL supports two procedures for calculating the fi nd : one 
when equal variances are assumed “Two Sample Assuming 
Equal Variances” and “Two Sample Assuming Unequal 
Variances.” 

The significance of a difference between variances (stan¬ 
dard deviations) and thus the need to apply the Welch- 
Satterthwaite approximation can be estimated applying an 
F-test (131). 
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Note The df according to this formula does not necessarily 
yield an integer and may be rounded up, or down, to the 
nearest integer in evaluating the obtained f-value from a table. 
The use of the Welch-Satterthwaite approximation gives a 
conservative estimate of the significance. 

The uncertainty of the quantity value of a reference material 
is usually expressed as a Cl or as the standard error, i.e., how 
well the value has been determined. Assuming a SEM of the 
reference material to be s(Trm) the formula will become 


*i - *rm 


(130) 



The quantity value of the reference material may be an 
assigned value without uncertainty attached. Then Equa¬ 
tion (130) can be further simplified. 

The Student's independent f-test assumes a Gaussian distri¬ 
bution of the data sets. If this is not the case, the nonparametric 
method of choice is Mann-Whitney U-test which is described in 
some detail below. An alternative to identify a difference 
between samples is the Tukey's quick test. 

A Comparison Between Many Series 

If many series are compared with successive Student's f-test 
and they are independent, then the overall probability (1 — a) 
to conclude that there is a difference although there is none, 
will increase. This is because the probability when we test 
the null hypothesis of the comparisons is the product of the 
individual probabilities (1 — oq) x (1 — a 2 ) x (1 — a 3 )... 

If a is 0.05, then for three series the overall probability would 
be 0.95x0.95x0.95 = 0.857. The probability that at least one 
error occurs is 1 — 0.857 = 0.143 which explains and quantifies 
the risk to assume a difference when there is none. A simple 
correction, namely, the Bonferroni correction, is to divide 
the a (in this case 0.05) with the number of comparisons, i.e., 
3 and the combined a equals 0.017. This is the table value 
for each of the comparisons, and the degrees of freedom (df) 
is n — 2 as used for the f-test; EXCEL:T.INV.2T(a;d/) for a 
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two-sided comparison and TINV( 1 —a ;df) for a one-sided 
comparison. 

In essence, the Bonferroni correction makes it more difficult 
to achieve and eventually exceed the critical f-value. 

The Bonferroni correction is regarded as a conservative test 
and in other words may make it “more difficult” to demon¬ 
strate significance. This leads to a decreased statistical power 
which may be overcome by including more observations in the 
comparisons. 


Interpretation of a t-Value 

Estimated f-values are interpreted in a f-table. The entries to 
this table are the df and the probability. For a given df find the 
value below, and as close to the estimated value as possible in 
the table. The column in which it is found represents the prob¬ 
ability. Thus, an estimated f-value can lead to different inter¬ 
pretations depending on the number of observations. The 
higher the f-value, the less probable it is that the compared 
quantities are the same (the null hypothesis true). 


Example 

A f-value of 2.2 was obtained in a study. The df was 20 

(Table 13). 

Thus, the probability (p-value) that the null hypothesis was 
true was <0.05. Note that the interpretation of the f-value is 
independent of how the f-value was estimated. Using the 
EXCEL TINV(0.05,20) = 2.086. 


TABLE 13 Extract of a t-Table 


df 

Probability of Two-Tailed Test 


0.1 

0.05 

0.01 

0.001 

9 

1.83 

2.26 

3.25 

4.78 

18 

1.17 

2.10 

2.88 

3.92 

20 

1.17 

2.09 

2.85 

3.85 
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Example 

Suppose we have the data set: 



1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

Mean 

s(.y) 

1 

5.2 

5.6 

6.8 

3.5 

5.9 

6.3 

7.2 

8.0 

6.8 

5.7 

6.10 

1.24 

2 

4.8 

7.1 

6.3 

4.7 

5.8 

4.9 

5.6 

7.5 

5.1 

5.8 

5.76 

0.96 

Diff. 

0.4 

-1.5 

0.5 

-1.2 

0.1 

1.4 

1.6 

0.5 

1.7 

-0.1 

0.34 

1.09 


Assume further that the data have been collected from a 
normally distributed data set and further that the difference 
between the observations is also normally distributed and that 
the variances of the sample results are not significantly differ¬ 
ent (see below). We can use the data in two examples. 

One is to assume that rows 1 and 2 are from different exper¬ 
iments and the task is to investigate if the means are different. 
The second is to look upon the first row as results obtained 
before a treatment and the second the results after the treatment. 

In the first case, we apply the fi nc j and in the second the fdep : 


1 . | find | = 


Xl -X 2 


6.10-5.76 


0.34 


0.34 

050 


S 2 

" + ^ 

/l.24 2 +0.96 2 ^0.25 

V 10 

= 0.68; 

df = 10 + 10 - 2 = 18 


From a f-table, we find that the f-value for df= 18 should 
exceed 2.1 to be significant in a two-sided test. 


2 - |fdep| — S( i ~ 


0.34 x /to 
L09 


= 0.99; df = 10-1 = 9 


n 


From a f-table, we find that the f-value for df= 9 should exceed 
2.3 to be significant in a two-sided test. This information can also 
be retrieved by the TINV(Probability, df) function in EXCEL. 

Comparison of Variances 

To answer the question if there is a significant difference 
between the variances of the results of two samples, the F-test 
may be used: 



( 131 ) 




























NONPARAMETRIC COMPARISONS 


65 


The larger s is always in the numerator; thus the F-value 
is always >1. The F-value is interpreted using an F-table. 
The F-table considers the number of observations in both 
groups, i.e., the df may be different in the samples (iq — 1) 
and (n 2 — 1), respectively. 

For a one-sided test (95 % probability), use a = 0.05, for a 
two-sided test a = 0.05/2. 

The critical F-value is calculated in EXCEL as FINV( a, 
(n 2 - l),(n 2 - D). 

Note The EXCEL requires that the number of observations 
with the larger variance should be entered first. EXCEL carries 
only out a one-sided test and the a-value should be chosen 
accordingly if a two-sided test is desired. 


Example 

The variances in the example from the f-test were 1.54 
and 0.93. The F-value is 1.66. In EXCEL, FINV(0.05,9,9) = 3.18 
for a one-sided test and thus the variances are not significantly 
different since the critical value (3.18) is not exceeded. 

The F-test may also be used to answer the question if a 
method (new) is significantly more precise than another 
(old), this is a one-tailed use: 


F c = 


’old 


(132) 


In either case, the variances are considered significantly dif¬ 
ferent if F c > F crit (table value or FlNV(cn,(n 1 — l),(n 2 — 1))). 

Since the F-test is based on variances, it assumes a Gaussian 
distribution of the data. 


NONPARAMETRIC COMPARISONS 


In general, conclusions drawn from nonparametric 
methods are less powerful than parametric when the distri¬ 
bution is known and its properties can be applied. However, 
as nonparametric methods make fewer assumptions, they are 
more flexible, more robust, and sometimes applicable to 
ordinal data. 
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The nonparametric test for dependent data, corresponding 
to the Student f dep , is the Wilcoxon signed-rank test. To com¬ 
pare independent data, the Mann-Whitney's IZ-test corre¬ 
sponds to that of Student's f ind . 


Wilcoxon Sign Rank Test for Paired Samples 

The significance of a difference between paired obser¬ 
vations can be evaluated by a nonparametric test, usually 
recognized as the Wilcoxon sign rank test. This is used to test 
the null hypothesis that there is no difference between the 
paired observations or in other word if the ranked pairs 
in the positive and negative groups differ from what would 
be expected. The basic assumption is thus to estimate if 
the probability that the obtained number of positive (W + ) 
and negative (W _ ) differences is what would be expected 
in view of how many observations have been made and 
applying the binominal theorem. If the number of observa¬ 
tions is large (e.g., >20), the distribution may be normalized 
and the Wilcoxon test solved by coding in EXCEL as 
described below. 

Strictly, a condition for applying this test is that the distribu¬ 
tion is symmetrical. 

The differences of the paired observations are sorted 
and ranked, disregarding the sign of the difference, i.e., 
the absolute numbers of the results are ranked. Thus, the dif¬ 
ferences have been exchanged for their ranks in an ordered 
data set. In EXCEL, the ranking can be achieved without 
physically sorting the data by using the RANK function, 
e.g., RANK(ABS(A2),$A$2:$D$10), where $A$2:$D$10 is 
the data set and ABS(A2) is the absolute value of the first dif¬ 
ference in the data set. It is important to retain the sign of the 
values because in the next step the ranks of the positive dif¬ 
ferences and the negative differences shall be added to 
obtain (W + ) and (W _ ), respectively. 

If the sum of the absolute ranks is about 20 or above, a nor¬ 
malization (Gaussian approximation) can be used: 

n[n +1) 


l l w — 


4 


(133) 




NONPARAMETRIC COMPARISONS 


67 


and 


<7W = 


n x (n + 1) x (2n + 1) 

24 


z = 


The difference can then be expressed as a z-score: 
Wmax Mw 


nx(n+l)x(2n+l) 
24 


( 134 ) 


(135) 


The z-value can be evaluated using an ordinary t- table, but it 
is often suggested to use z > 1.96 irrespective of the number of 
observations. 

The ranking may include ties. In principle, there are two 
types of ties, one when the difference is zero and the other 
when the differences are the same. In evaluating the signed- 
rank test, all differences that are zero should be disregarded. 
Other ties are given the mean of their ranks, e.g., suppose there 
are two differences with the rank of 11 then they would each be 
given the rank of 11.5 and the next rank 13. If there were three, 
they would occupy the ranks 11, 12, and 13, each being given 
12, and the next rank 14. EXCEL does not handle the ranks this 
way but would assign the rank of 11 to all in both examples. In 
EXCEL 2010 there are two functions RANK.EQ and RANK. 
AVG, the first giving each tie the same number the second 
the average of the numbers of the ties. 

It is advisable to check the assignment of ranks. The sum of 
the possible ranks is always equal to 


n x (n + 1) 

2 


(136) 


i.e., the same as unique numbers in a “cross-table”. 

The calculation of the z-value thus disregards any differ¬ 
ences that are zero and compensates the variance in the 
denominator by subtracting q = for each group of tied 
values, where t is the number of ties in each group. In labora¬ 
tory practice, it is rarely necessary to make this compensation. 

There are different procedures to evaluate the comparison. 
The above may be inappropriate for small numbers when 
instead a table is necessary. Normally, the smaller of the 
ranked sums is compared with the table value. A rule of thumb 
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is that the larger the difference between the W + and W , the 
more probable is the significance of the difference. 

Example 

The following paired data are extracted from a comparison 
of two measuring procedures for S-Triglycerides, known for 
belonging to a skew distribution: 


1. Calculate the difference as a decrease between what is 
appointed as the first measurement and the second but keep 
track of the sign. Thus, a decrease is a negative number. 

2. Ignore the sign and rank the absolute difference. Any 
differences that are 0 should be completely disregarded. 

3. Ties should be given the same numbers and the 
remaining updated accordingly. 

4. Find the sum of the positive ranks and the negative ranks. 



1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

Assay 1 

1.24 

1.34 

1.39 

1.41 

1.64 

1.44 

1.48 

1.51 

1.54 

1.54 

1.54 

1.62 

1.63 

1.65 

1.70 

Assay 2 

1.30 

1.50 

1.70 

1.50 

1.44 

1.47 

1.60 

1.60 

1.80 

1.50 

1.70 

1.90 

1.81 

1.70 

1.65 

Diff 

-0.06 

-0.16 

-0.31 

-0.09 

0.20 

-0.03 

-0.12 

-0.09 

-0.26 

0.04 

-0.16 

-0.28 

-0.18 

-0.05 

0.05 

Abs diff 

0.06 

0.16 

0.31 

0.09 

0.20 

0.03 

0.12 

0.09 

0.26 

0.04 

0.16 

0.28 

0.18 

0.05 

0.05 

Rank 

5 

9.5 

15 

6.5 

12 

1 

8 

6.5 

13 

2 

9.5 

14 

11 

3.5 

3.5 


Thus, the (W+) is 17.5 and the (V\T) 102.5; (W+) + (t\r) is 
12 X (12 + 1 ) _ !2q 


, (w = yO|±l) = 60; „ w = ^0*31 = 17 , 61; 
q = Y .‘-^ = ?-2 + 2-2 + 2?-2 = 0375 . 


Z = 


48 

102.5 - 60 


V17.61 2 - 0.375 


48 

= 2.42 


Thus, the difference between the pairs would be judged 
significant with a p < 0.05 (p = 0.016; two-tailed). 


Mann-Whitney Test for Unpaired Samples 

The nonparametric method for unpaired samples is the 
Mann-Whitney test or Mann-Whitney ll-test and thus the 
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nonparametric solution to evaluating two independent data 
sets comparable to the Student's t ind . The test can be described 
as ranking all the results as if they belonged to one measure¬ 
ment and then sum the ranks of the samples belonging to 
the two methods, separately. The sum of the ranks is T s and 
T l , representing the smaller and larger rank sums, respec¬ 
tively. If the groups were the same, the difference in rank sums 
should be small. The probability of the sums being equal is 
evaluated by the binominal theorem. 

There are different ways to evaluate the difference: 


U= T- 


n a x (n a + 1) 

2 


(137) 


where n a refers to the number in the group with the lower 
rank sum. U is the test statistic if estimated from the 
group with the lower rank sum. Consult a “Mann-Whitney 
table” which usually has the number of observations in both 
groups as entries. The target is to find the last “probability 
column” that does not contain the statistic ( U ). There are 
different layouts of the tables, but they only cover up to a 
total number of observations of about 20. If there are more 
observations, the rank sum approaches a normal distribution 
and the z-value can be calculated. Two approaches are 
available: 


ns(N + 1 ) , 

/i s =- (Altman) 


Alternatively, 


/A = 


»S«L 


(Engineering handbook) 


0’S = 


n s x n L x (N + 1) 

12 


z = —-—— (Altman) 
0’S 

U s -Us 


z = 


0 ’s 


(138) 


(139) 

(140) 

(141) 


(Engineering handbook) 


(142) 
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where ns is the number of observations in the group with the 
lower rank, ri\ the number of observations in the other group, 
Ts is the lower rank sum, and Us is the statistic calculated 
according to Equation (137). The z -value can then be evaluated 
by a standard normal distribution or NORMSDIST(z). 


Example 

The concentration of two different materials was measured 
by the same procedure: 



1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 


Mate¬ 
rial 1 

1.24 

1.34 

1.39 

1.41 

1.64 

1.44 

1.48 

1.51 

1.54 

1.54 

1.54 

1.62 


Mate¬ 
rial 2 

1.30 

1.50 

1.70 

1.50 

1.44 

1.47 

1.60 

1.60 

1.80 

1.50 

1.70 

1.90 

Sum 

Rank 

1 

4 

5 

6 

3 

7 

10 

14 

15 

16 

17 

19 

117 

2 

12 

21.5 

12 

8 

9 

20 

18 

23 

12 

21.5 

24 

183 


U s = 12 x 12 + 0.5 x 12 x 13 - 117 = 105; 
U S + U^ 24X( f + 1) = 300 


H S = 12 x (24 + 1) = 150 ( Altman) 


2 

12 x 12 


A'S = 


= 72 (Engineering handbook) 


^ / 2xl2 1 f +1 ) = o 


z = ^ =1.91 (Altman) 

17.32 v ’ 

The NORMSDIST gives p = 0.03: 

I _ 721 

z = — j „ — — = 1.90 (Engineering handbook); p = 0.03 

The probability can also be estimated from the test statistic 
(U). A selection of lines from an appropriate Mann-Whitney 
table indicates the probability: 
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«1 

n 2 

p = 0.1 

p = 0.05 

p = 0.02 

p = 0.01 

11 

12 

104-160 

99-165 

94-170 

90-174 

12 

12 

120-180 

115-185 

109-191 

105-195 

12 

13 

125-187 

119-193 

113-199 

109-203 


In this example, there are 12 observations in each group and 
the test statistic. Us, is 105. The last column of the table that does 
not include the test statistic is p = 0.02. 


REGRESSION 


Regression is the statistician's term for describing the 
relation (dependence, association) between two variables. 
By convention, the independent (reference or comparative) 
variable is shown on a horizontal axis (the X-axis) and the 
dependent (test) variable on the vertical (Y-axis). It may 
be useful to visualize the independent variable as the cause 
and the dependent as the effect variable. This is a common 
terminology particularly in multivariate analysis but that 
does not imply that the regression will address the causality 
of a found association. 

In analytical work, regression analysis is used for calibra¬ 
tion functions. Regression is also used to compare the results 
of two measurement procedures. 

To establish a regression function, the two quantities 
are measured in the same sample and thus pairs of values will 
be obtained which can be represented in a two-dimensional 
diagram, often recognized as a “scattergram” (Figure 6). The 
simplest regression function describes a linear (first order) 
relationship, but there are innumerable types of functions. 
The mathematical function that describes a linear relationship 
(regression) can be established from a minimum of two pairs 
of observations. Axiomatically, one and only one straight line 
can be drawn between two points. 

It is recommended in comparisons to display observations 
in a scattergram to provide a visual impression of the data 
set. This will facilitate recognizing trends, “outliers” and distri¬ 
bution of data points. 
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Ordinary Linear Regression 

The ordinary linear regression (OLR) or linear least square 
regression function describes a straight line in a two- 
dimensional diagram. Its mathematical representation is 


(143) 


Y = b x X + a 


where b is the slope of the line and a is the Y-intercept, i.e., the 
Y-value where the line crosses the Y-axis, i.e., when the value 
of Yif X=0. 

The OLR establishes a line which minimizes the vertical 
differences between each observation and the line and disre¬ 
gards the relation to the x value. Therefore, it is important to 
choose the variable with the smaller measurement uncer¬ 
tainty as the independent quantity on the horizontal axis 
{x- value). The regression line is a kind of average of all obser¬ 
vations in the measuring interval. It is always centered on the 
average of the independent and average of dependent vari¬ 
ables (x/y). 

If a set of paired observations (x,/y,) and the number of pairs 
(n) are given, the regression function can be calculated. 

First the slope (b y / x ) or “regression coefficient” is calculated: 

YTi= J( x ‘ ~ *) x (y- - y)] ZT= M X yd - n x X x y 





(144) 


The formulas can be simplified by defining the “sum of 
squares”: 



(n* - 1) x s(x) 2 


(145) 



K -1) x s(y) 2 


(146) 








REGRESSION 


73 


SS X y — (Xj X \J{ 


E n \~~^ n 

i=x Xi x L ,-=& 


TLM ~ X ' )X ( XJi ~ y) 


Then, 

SS 

u _ 

Vy,x - pc 

^^XX 


(147) 


(148) 


The means of the variables are assumed to be on the line and 
thus satisfy the function. The pair of means is also recognized 
as the centroid of the function. The means of the results of the 
test (y) and reference (x) measurements can thus be used, 
together with the estimated slope, to estimate the intercept 
and thus the regression function can be calculated: 

y = b y/x xx + a- a = y — b y/x x x (149) 


Since the means of the quantity values are used in the calcu¬ 
lation of the slope and also the intercept, it is important that the 
means are representative of the quantity values. This, strictly, 
requires that the quantity values are normally distributed 
along both axes. 

In the OLR, the sum of the squared vertical distances 
between the observations (if) and the point calculated by the 
regression function on the regression line (y) are minimized. 
These distances are recognized as “residuals.” 

A consequence of the OLR model is that it does not include, 
or consider, any measurement uncertainty of the independent 
variable. 

Many quantitative relations between a signal and a concen¬ 
tration in analytical chemistry are linear, and the OLR is fre¬ 
quently used in calibrations. There are many exceptions 
from linearity in measuring systems, e.g., the relation between 
signal and concentration in immunoassays is rarely linear. 

A calibration is usually performed and displayed with the 
signal on the Y-axis and the concentration on the X-axis. The 
concentration of the calibrator is usually known with a small, 
negligible, or zero uncertainty, and therefore, the OLR is a suit¬ 
able model for calibration if the regression is linear. When the 
calibration is used to convert a signal to concentration, its 
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reverse is used, i.e., the signal (Y) is entered to the calibration 

c . v Y — a 

tunction A =-. 



The estimated concentration will thus be traceable to the cal¬ 
ibrator via the calibration function. 

The uncertainty of the signal is transferred to the estimated 
concentration, and the result will have an increased uncer¬ 
tainty compared to that of the calibrator value. 

The uncertainty (standard error) of the slope and intercept 
can be estimated by formulas of several different formats, giv¬ 
ing identical results: 



(150) 


where y t is the value of the dependent variable estimated from 
the corresponding x, and the regression function and thus 
(y, — i/ ; ) is the residual. 

Sy /X is the standard deviation of the residuals 



Note The uncertainty that is obtained corresponds to the 
standard error of the slope. Therefore, the Cl of the slope is ±z 
x u(b). 

The significance of the slope being different from zero (i.e., 
horizontal)—or indefinite [1] (i.e., vertical)—is obtained by cal¬ 
culating the Student's independent f-value (126). The standard 
error of these extremes is zero and the calculation of the f-value 
simplified as 


f = 



u{bf - 0 


and 


respectively (151) 
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The f-value is evaluated using an ordinary f-table and 
df= n — 2. At low values of the slope (b y / x ), it may be important 
to demonstrate that the slope is different from zero; if not, all 
x-values give the same y-value, i.e., Y=a. 

Note The correlation of the variables can be significant, and 
the coefficient of variation high even if the slope is not 
significantly different from zero. If the slope is zero, however, 
there cannot be an association between the variables or 
quantities. 

The uncertainty (standard error) of the intercept includes 
the uncertainty of the residuals, s yA (164). The formula comes 
in many forms: 



If the data are displayed in an EXCEL spreadsheet, the OLR 
can be directly shown in the graph, by adding a “trendline.” 
There are also functions to directly calculate the slope and 
intercept for a data set, e.g., SLOPE(Yl:Yn,Xl:Xn) and INTER- 
CEPT(Yl:Yn,Xl:Xn), where (Yl:Yn, Xl:Xn) defines the 
data set. 


Note A characteristic of the OLR, or “method of linear least 
squares,” is that the variance (uncertainty) of measurements of 
the independent quantity (X value) is assumed to be zero or 
that the ratio between s(i/) 2 and s(x) 2 , referred to as X ir 
discussed below (153), is large. The OLR is relatively robust, 
and acceptable results may be obtained also if the variance of X 
values is >0. Lurther, the variance of the independent variable 
(Y) should be homoscedastic, i.e., the same within the 
measuring interval. 
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The OLR is sensitive to outliers, and extreme values will 
therefore have a major impact on the OLR function (see section 
on “Leverage”). 

The quantity values should ideally be normally distributed 
in both directions and the OLR is regarded as rather robust 
also in that respect. 

Example 

Estimate the OLR of the following data 



1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

Mean 

s 

X-value 

1.24 

1.34 

1.39 

1.41 

1.64 

1.44 

1.48 

1.51 

1.54 

1.54 

1.54 

1.62 

1.47 

0.12 

Y -value 

1.30 

1.50 

1.70 

1.50 

1.44 

1.47 

1.60 

1.60 

1.80 

1.50 

1.70 

1.90 

1.58 

0.17 


From EXCEL, the slope and intercept were 0.7856 and 0.4261, 
respectively. The regression function can also be displayed on 
the graph as a “trendline” function (Figure 7). 

Using the formulas presented: 




FIGURE 7 Scatterplot of results of the example. In the right panel, the vari¬ 
ables have been swapped, i.e., the x-values of the table are on the vertical axis 
and the y-values are on the horizontal axis. The function of the OLR is shown 
and the coefficient of detection ( r 2 ). Note: The functions are different but 
describe the same relation, i.e., the y dependence on x in both examples. 
The coefficient of determination is unchanged. The equal-sized units and 
length of axes facilitate comparing the regressions with the “equal line” with 
a slope of 1, i.e., 45°. The regression lines pass through the average of the 
values of the dependent and independent variables. 
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ss** = - x) 2 = (n - 1) x s(x) 2 = 11 x 0.12 2 = 0.148; 

SS yy = 11 x 0.17 2 = 0.309 

ss* y = XT= M - *) x (y*-y) = °- 116 

SS XV SS™ 

& y/ * = = 0.785; b* /y = = 0.386 

^^XX OOyij 

where b y / x is the slope of the regression function “Y on X.” The 
slope of “X on Y” is written b x / y . If the slope is not identified by 
an index, it is usually b y / x : 

a = y-bxx = 1.58 - 0.785 x 1.47 = 0.426 

The uncertainty of the slope: 
s 0148 

u(b) = ' /-A = ' =0.383 (s y x is 0.148; see Equation 164) 

v SS XX V0.148 

The uncertainty of the intercept is 


«W=V x |-x 


e;l,( 


ss. 


= 0.148 x 


26.226 
12 x 0.148 


= 0.567 


Deming Regression 

In practical work, there is usually a variation in the measure¬ 
ments also of the independent variable. The Deming linear 
regression minimizes the perpendicular (ortho) distances 
between the observations and a calculated regression line 
(Figure 8) by including the ratio between the variance of the 
independent and dependent observations. If the ratio is equal 
to 1, then the model minimizes the perpendicular distance to 
the regression line. This is the orthogonal regression. The 
larger the ratio, the more vertical the minimal distance will 
be and at high ratios eventually becomes vertical and the 
regression function becomes identical to the OLR. 

Before the slope (b D ) can be calculated, the ratio between the 
measurement variance of the quantities s(y) 2 and s(x) 2 must be 
defined 
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FIGURE 8 Deming (dotted) and ordinary linear regression (solid) lines. 
Thin solid lines between the observations and the regression lines represent 
the distances that are minimized. In the left panel the lambda (Xj) value is 1 
resulting in minimizing the perpendicular lines (green) to the anticipated 
regression line (Deming). hr the right panel, the (X,) 2>1 and accordingly 
the vertical lines to an assumed regression line are minimized (blue). This 
is the ordinary linear regression line. 


It is an advantage to also calculate a function (V) that occurs 
repeatedly in the calculations 

_ ~ y) 2 - k x J2'j=i^ Xi ~ _ 5S yy ~ h x SSq 

2 x J2\=i^ Xi - *) x (y* - ~ y)l 2 x SSxy 

(154) 

b D = V+ Vv 2 + k (155) 

[s(x)] 2 

Note In statistical literature, the X is often defined as 7 -ry, 

[s(y)l 

i.e., Xi = 1/X. If that definition is used, then the corresponding 
changes in Equations (154) to (158) shall be made. 

The Deming regression approaches the OLR if the s(y) is 
much larger than s(x) and accordingly /,■ 1. It may therefore 

be more convenient to use ki as defined in Equation (153) in 
these discussions. The Deming regression method, like the 
OLR, requires that the variance is constant, i.e., homoscedastic. 
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for both variables and the observations reasonably normally 
distributed. 

The b D can also be calculated from the OLR regression coef¬ 
ficient Y on X and the Pearson correlation coefficient (r). This is 
achieved by introducing b y / x and b x / y and rearranging 
Equation (154) 


y _ SS yy -X ss x 


2 x SS 


xy 


SS 

°°yy 

2 x SS 


'xy 


Xi x SS XT 
2 X SSxy 


X; 


Xi 


2 x SS^y 2 x SS^ 2 b x/y 2 b y / x 


SS 


yy 


SSx 


Since 

1 _b„ 

x/y~ 


= 2 yJl- h.~ 

br/ 1/ f' 2 ’ V & 


r 

'yA 


v = 


\,/x 


2 ; 


2 x r 2 2 xb 


'y/x 


r= Jb y / X x b x/y 


(156) 


(157) 

(158) 


which is then entered into Equation (155). The intercept is esti¬ 
mated as described in Equation (149). 


Example 

Use the same data set as in the previous section, copied here 



1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

Mean 

s 

X-value 

1.24 

1.34 

1.39 

1.41 

1.64 

1.44 

1.48 

1.51 

1.54 

1.54 

1.54 

1.62 

1.47 

0.12 

Y-value 

1.30 

1.50 

1.70 

1.50 

1.44 

1.47 

1.60 

1.60 

1.80 

1.50 

1.70 

1.90 

1.58 

0.17 


Assume that s(x) = s(y) in this experiment. Thus, /., = 1. 


_ SSyy - Xi x SSxx _ 0.309 - 1 x 0.148 
2 x SSxy 2 x 0.116 


b D = V + xjV 2 + X i = 0.692 + \J 0.692 2 + 1 = 1.90 
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a.D = y — bo x x = 1.58 — 1.58 x 1.47 = —1.29 


The alternative calculation of the slope is 


h,y * ’ 



0.296 

0.786 


0.377 


V = 


b y/* 

2 x r 2 


k 

2 X by /x 


0.786 
2 x 0.296 


1 

2 x 0.786 


0.692 


etc. 

When performing these calculations, it is essential to retain 
as many value digits as possible and only make any rounding 
in the final result. 

There are different formulas to estimate the uncertainty of 
the slope u bD and intercept u aD and they do not always give 
the same result. 

The uncertainty of the slope of the Deming regression is 


UbD 


ftp x (1 - r 2 ) _ b_o 
r 2 x (n — 2) r 


1 — r 2 


(159) 


where r is the Pearson correlation coefficient (see 
Equation 165). 

The uncertainty of the intercept: 


U a D = 


l{u bD ) z x 


(160) 


In the formulas used here, the r (correlation coefficient) 
and the sum of the squared results of the independent vari¬ 
able are necessary. They are available in EXCEL: r: CORREL 
(Y1:Y12;X1:X12) = 0.543 and SUMSQ(X1:X12) = 26.226, respec¬ 
tively: 


' b b x (1 — y z ) _ / l.90 2 x (1 - 0.544*) _ i3g 
r 2 x (n - 2) V °- 5442 x ( 12 “ 2 ) 


U n D = 


/W 2 x E"=^_ /0.5241 x 26.23 


12 


= 0.775 
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Weighted Regression 

In a regression analysis, it may not be reasonable to assume 
that every observation should be treated with equal weight. A 
procedure that treats all of the data equally would give less 
precisely measured points, e.g., at the ends of the measuring 
interval, more influence than they should have and would give 
highly precise points too little influence. To minimize the 
influence of outliers and extreme values methods for weight¬ 
ing the data have been developed. The reader is referred to 
textbooks on statistics for further discussions of weighting. 


Other Regression Functions 

Two-Point Calibration 


Between two points, one, and only one, straight line can be 
drawn. This is utilized in establishing the calibration function 
from two concentrations x 1 /y 1 and xjy^- 


by/ X 


{yi - yi) 

(*1 - X 2 ) 


(161) 


where x lr x 2 and y lt y 2 are the corresponding results of the 
independent and dependent variables, respectively. 

The regression (calibration) function is 


Y — yi = b x (X — x\) or Y — y 2 = b x (X — x 2 ) (162) 


The average of the two points (xq/yi and x 2 /y 2 ) may also be 
used to establish the regression function, instead of either of 
the measured points. 

A “two-point calibration” assumes a linear relation between 
the quantities. 

The “two-point” formula can also be used to establish a reca¬ 
libration function from patient or control samples. 

The Pearson correlation coefficient (r) for a function derived 
from only two points will, by definition, be 1. 


Regression from two intervals 

The standard deviations s(x) and s(y) represent the distribu¬ 
tion of the values. It is reasonable that the ratio between s(y) 
and s(x) is a measure of the slope (b y / x ): 
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b y/x 


s(y) 

s(x) 


(163) 


The intercept is then estimated as above (149) using the 
coordinates of the average. 


Example 

In the previous example, the s(y) and s(x) were 0.168 and 0.116, 
respectively. The estimated b y / x = 1.45, intercept = —0.54. The 
usefulness and accuracy of this approximation of b ]//x are 
depending on the underlying distributions which must be nor¬ 
mal to ensure that the calculations s(y) and s(x) represent the 
spread of the data. Great care should be exercised in the use 
of the method. Since the s(x) and s(y) are the positive roots of 
the corresponding variances, this method will always result in 
3 - &y/x 5 * 0 . 

Compare OLR Y=0.786X + 0.426; Deming Y = 1.90X —1.23. 

Bartlett Regression 

Order the data set according to the independent variable 
and divide it into three intervals low, mid and high with equal 
numbers of observations. If the number of observations is not a 
function of three, adjust the mid interval so the low and high 
include the same number of observations. Calculate the aver¬ 
age of the high and low interval and calculate the slope accord¬ 
ing to the two-point formula (161) and the intercept from the 
average of all the y and x-values, respectively. The Bartlett's 
regression is assumed to allow a measurement uncertainty 
in both dependent and independent variables. 

Passing-Bablok Regression 

Other techniques have been developed to accommodate 
variances in the results of both variables, and Passing-Bablok 
regression, similar to the Thiel-Sen estimator, is the most 
favoured. These are nonparametric and do thus not assume 
any particular distribution. Essentially, the Passing-Bablok 
(P-B) calculates the slope for all possible lines combining 
the observations, excluding those which are 0 or indefinite. 
The slope of the regression line for all observations (b) is the 
median of those of all the connecting lines. The intercept (a) 
is then calculated from the median or mean of all observations. 
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As in calculations based on medians, the influence of outliers is 
less than for parametric methods. 

The Passing-Bablok regression requires comparatively 
extensive calculations. 

Example 

Use the data in the previous example; calculate the average 
of the low and high thirds. The slope ( b ) and intercept (a) of the 
other described functions are summarized. 

Note The different regression functions are designed to 
handle measurement uncertainties differently which is not 
reflected in the used dataset. 



Low 

High 

All 


Bartlett 

OLR 

DLR 

P-B 

Avg (x) 

1.345 

1.500 

1.474 

b 

0.56 

0.79 

1.90 

1.67 

Av g (y) 

1.585 

1.635 

1.584 

a 

0.76 

0.43 

-1,29 

-0.86 


The regression lines are displayed in Figure 9. 



FIGURE 9 Regression lines calculated according to ordinary linear regres¬ 
sion (OLR), Deming regression (DLR), Bartlett and Passing-Bablok, from the 
same dataset. 
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Linearity 

Linearity is characterized by a first-order function that 
describes the relation between a signal, e.g., light absorbance 
and the concentration of an analyte. 

If the intercept is zero, the function will provide results that 
are directly proportional to the concentration of the analyte in 
the sample. A sample that is diluted 1 + 1 shall therefore give 
a signal that is half of the original signal and the result shall be 
half of that the original sample. This is not always the case in 
biological samples; a reason may be that inhibitors are ineffi¬ 
cient or more potent in diluted samples. 

If two methods pertaining to measure the same quantity are 
linearly related, they may be assumed to in fact measure the 
same quantity irrespective of the numerical results. This 
makes recalibration using a reference procedure possible. 

A thorough procedure for evaluate the linearity of a mea¬ 
surement procedure has been published as EP6 by the CLSI 
(www.CLSI.org). 


Higher Order Regressions 

It is not uncommon that a relation between quantities is non¬ 
linear, i.e., the function that describes the relation is of a higher 
order. Because linear regressions are easy to apply, analysts 
usually try to linearize the functions, as has been described 
above (Table 2 and Figure 2). 

Higher order functions may be fitted to data by special pro¬ 
cedures which are available in some statistical packages. 
EXCEL offers five different trendlines to be fitted to tabled 
data. They are available by activating a data set in a graph 
“FORMATE DATA SERIES.” 


Residuals 

Intuitively, the spread is related to the distances between 
the observations and the estimated regression line. These 
distances are the “residuals” and their standard deviation 

( s y,.v) IS 
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N 


n — 2 


(164) 


(« - 1) x [s 2 y - b] jjx x 


M 


n - 2 


s y x 



where b y / x is the slope, s y and s x the standard deviation of the 
y and x variables, respectively, y (y-roof) is the y-value 
calculated using the regression function at a particular x,-. 
The y, — y is also recognized as the “error term.” 

In EXCEL, the s yiX is calculated by the function STEYX(y:s, x:s). 

The s y x is also recognized as residual standard deviation 
(: rsd ) or (s res ), residual standard error (rse), standard deviation 
of the line (sdl), standard error of the estimate ( see ), or linear 
residual standard deviation ( ressd ). 


CORRELATION AND COVARIANCE 


Correlation describes and quantifies the strength and direc¬ 
tion of a linear relationship between two random variables. 

Correlation Coefficient 

A correlation coefficient describes the dependence of two 
random variables and thus describes the spread of the 
observed pairs of observations. This dependence is quantita¬ 
tively described by the correlation coefficient, the Pearson 
Product-Moment correlation coefficient, which can be calcu¬ 
lated by many seemingly different formulas which, however. 
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can be derived from each other but represent different ways to 
visualize the correlation coefficient: 


SS 


r = 


x (y»-y)] _ 

T5*‘ - A * - y) 1 - ^ ss “ x SSw 


SS 


*y 


/SSv 


ss x 


SS 


yy 


= b y/x x 


SS Y 


'SS 


yy 


by/x X 


N 


SS V 


SS 


xy 


SS 


= V x \lir L = V b y/x x &*/y 


w 


SS 


J yh 


’XI/ 


(165) 


b y/x x 


NE>-y> 


2 = h V/x X 


s(*) s(x) 

-2 = by/ X X —- 

2 y/ s y 


\ s (y) 


SS X y, SS**, and SS yv are defined in Equations (145)-(147). 
The formula can also be written in yet another form that 
avoids calculation of the means: 



(166) 


r can assume values between (—1) and (+1), i.e., — 1 <r < + l 
or r < 111. 

All calculations of r include the mean, the standard devia¬ 
tion, or derivatives thereof and thus require that the data are 
normally or close to normally distributed. The correlation 
coefficient describes the scatter of the observations or the 
association between the variables. Thus an r = 0 indicates no 
association and r=1 a perfect direct association, whereas 
r=— 1 indicates a perfect inverse association (Figure 10). 
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0 



-0.4 





I 


-1 






0 0 0 0 0 




0 

# •#. 


FIGURE 10 The Pearson correlation coefficient calculated for differently 
distributed data sets. The top row illustrates that the correlation coefficient 
describes the scatter, and its sign the slope of a linear regression (middle). 
The many non parametric data sets in the bottom row have a correlation 
coefficient of 0. From Wikipedia Commons. 


The correlation coefficient can be calculated for any data set 
and only describes the spread of the observations in a two- 
dimensional scatter plot. The relation of r to the OLR function 
is formal as shown in Equation (165). This does not exclude a 
relation between the correlation coefficient and the linear 
regression, and it will be sensitive only to a linear relationship 
between two variables (which may exist even if one is a non¬ 
linear function of the other). 

If the variables are expressed as vectors, the cos(y), where ip 
is the angle between vectors, will be equal to r. cos(tp) is equal to 


cos (tp) = 


XwJhi - *) * (y. - y)l 


which is already identified as one of the definitions of r (165). 

If the slope is calculated according to Equation (161), 
then r = l which would be expected since the distribution 
of the data has been simplified to one number, representing 
an interval. It is an axiom that between two points, 
only one straight line can be drawn. Consequently, the r 
will be =±1. 

Even a high correlation coefficient does not imply a causal 
relationship between the variables. 
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Coefficient of Determination (Pearson) 

The square of the correlation coefficient ( r 2 ), also known as 
the coefficient of determination, is understood as the fraction of 
the variation in y, that is accounted for by a linear fit of x, to 
xji, or differently expressed is the proportion of variance in 
common between the two variables. For example, for r = 0.7, 
r = 0.49 and thus, only 49 % of the variation is explained by 
the linear fit. The coefficient of determination is the quantity 
that should be interpreted; the correlation coefficient overesti¬ 
mates the association between the variables. 

The relation between the residual standard deviation and 
the correlation coefficient and coefficient of determination 
can be approximated: 


s y/x 


s(y) x 


in — 1 
n — 2 


(1 - r 2 ) 



n — 1 
n — 2 


x 


(1 -r 2 ) 


(167) 


If n — 1 approaches n — 2, then 



(168) 


Consequently, the smaller the ratio between s y/V and s(y) the 
larger r 2 . This relation is important to observe in evaluation of 
a comparison of results by regression analysis. 

It should be stated that correlation does not equal causation as 
already pointed out. There are many reasons to be careful draw¬ 
ing conclusions from correlation coefficients; even if the p -value 
indicates a high degree of probability—or significance—this 
may not give a clue to the root cause. It is essential also to view 
the correlation in relation to the regression, e.g., in a scatterplot. 
As will be shown below, the significance of r is highly depending 
on the number of observations. 


Example 

The data set used as an example in the section on “Reg¬ 
ression” is used previously. 
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1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

Mean 

s 

X-value 

1.24 

1.34 

1.39 

1.41 

1.64 

1.44 

1.48 

1.51 

1.54 

1.54 

1.54 

1.62 

1.47 

0.12 

Y-value 

1.30 

1.50 

1.70 

1.50 

1.44 

1.47 

1.60 

1.60 

1.80 

1.50 

1.70 

1.90 

1.58 

0.17 


The correlation coefficient, r, according to EXCEL is 0.544. 
Let us apply the first and last expressions in the chain of the 
algebraically different formulas above (165) which give iden¬ 
tical results: 


XL”K*' - *) x (yf - y)l 


X! 


i=n, 2 




0.116492 

C0.148292 x 0.309492 


0.1165 

0.2142 


0.544 


t>y/x 



0.7855 x 


0.1161 

0.1677 


0.544 


Spearman Rank Correlation 

The Pearson product-moment correlation, r, assumes Gaussian 
distributed data. If this is not the case, the Spearman's rank cor¬ 
relation is used to test the direction and strength of the relation¬ 
ship between two variables. 

Spearman's rank correlation ( r s ) or p (rho). 

The Spearman rank correlation coefficient is a nonpara- 
metric correlation coefficient. It only addresses the ranks of 
independently ranked variables. Calculating the Pearson cor¬ 
relation from data ranked in ascending order will give an 
approximate value of the r s . 

The r s can also be estimated according to 


6 x XL/? 

n x ( n 2 — 1) 


(169) 


where n is the number of pairs and d, the difference between 
the ranks. 
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If ties occur in either of the data sets. Equation (169) should, 
in theory, not be used unless the ties are resolved. However, 
the effect of a few ties is usually small. 

Resolving ties means that they are given different ranks. In 


EXCEL is 

IF(ISNUMBER(CR ), (( RANK(CR , C$R 2 : C$R„, 0) 

+COUNT(C$R 1 : C$R„) - RANK(CR, C$R 2 : C$R n ,l) 


+l)/2), "•") 


(170) 


where C denotes Column, R Row, R 1 the first observation, and 
R n the last observation. In EXCEL 2010 the function RANK. 
AVG(CR,C$Rl:C$Rn,l) will give the same effect. 

Example 

Assume a set of pair-wise observations: 


Obs. 1 

Obs. 2 

Rank 1 

Rank 2 

Diff., d, 

88 

105 

4 

8 

-4 

94 

93 

9 

3 

6 

83 

69 

2 

1 

1 

91 

91 

7 

2 

5 

90 

107 

6 

9 

-3 

89 

100 

5 

7 

-2 

82 

96 

1 

5 

-4 

93 

99 

8 

6 

2 

83 

95 

2 

4 

-2 

102 

110 

10 

10 

0 


Calculate the ranks in increasing order, for instance using 
the EXCEL function: RANK(R„R 2 :R„,0). 

Applying the Pearson product moment correlation to the 
ranks estimates the r s to 0.33. 

Alternatively, calculate the difference between the ranks (c/,j 
and apply to Equation (169): 



1 _ 0.697 = 0.30 
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Standard statistical programs, e.g., SigmaPlot, Prism/ 
Graphpad, Statistica, JMP return r s = 0.30. 


Significance of r 

The standard error of the correlation coefficient is 

se ( r ) = 7 == ( 171 ) 

where n is the number of individuals (samples, i.e., pairs) 
in the data set. Strictly, this should only be used for large sam¬ 
ples (>100). 

The significance of a correlation between two variables is 
estimated by Student's t- test: 

1*1 = r x Vr^l ; d f = n ~ 2 ( 172 ) 

The Lvalue is evaluated by the usual f-distribution table 
and thus allows the estimation of the statistical significance 
of r. 

Note High values of t, signaling statistical significance, 
will be obtained with a large number of observations even 
if r and thus r 2 are small, indicating a limited explanation 
by the correlation. The interpretation of the r will vary 
depending on the context. Thus, r = 0.8 may be regarded as 
unsatisfactorily low when comparing two measurement 
procedures in chemistry or physics, whereas it might be 
very differently appreciated in social or medical 
correlation studies where confounding factors may be 
more abundant. 

The Cl for ( r) is estimated after Fisher's transformation of 
the r: 

Z = \ x [ ln ( 1 + r ) - ln (! - r )} = \ x ln (^) ( 173 ) 

EXCEL offers a function for direct calculation of 
Z: FISHER(r). 
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The standard error of Z is 
1 


(174) 



Thus, the Cl will be 


CIz = Z±z x —== = ±z x sz 
\/n — 3 


(175) 


where z corresponds to the confidence level, e.g., 1.96 for 95 % 
confidence level. 

Note The difference between capital Z and low case z! 

The endpoints of the CIz (—(Z x Sz) and +(Z x Sz), respec¬ 
tively) are entered into Equation (173) to define the confidence 
limits of r. 

Note The Cl, is not symmetrical around r. 


(176) 


1 lim e 2z _|_ i 


Example 

The correlation coefficient, r, for a linear regression of 35 
values was 0.91. Calculate the 95 % Cl for r. 

Fisher's Z-value (173) is 1.53. Sz = ±1.96 x 0.18 = ±0.35 
(171). Thus, the CIz: 1.18-1.88 (175), corresponding to CI r : 
0.83-0.95 when the endpoints of CIz are inserted into 
Equation (176). 

Note The Cl is asymmetric around the estimated r. 


Covariance 

The dependence between two random variables is also 
described by the covariance which is a measure how two vari¬ 
ables vary together. 
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The covariance can be derived from the calculation of vari¬ 
ance (23) 


s 2 = 


n —1 


which can be written as 


,2 (*»•-*) 

b n -1 

The covariance (sample) between x and y is then 

Si" [(*i - *) x (y» - y)] _ l 


covariance = 


n — 1 


X SS.vy = 
n — 1 * 


r x 


x SS, 


yy 


n — 1 


(177) 


Thus 

(n — 1) x covariance 


r = 


(n — 1) x covariance 


\JSS X x x SSyy 1) x var(xi) x {n — 1) x var(yi) 


(178) 


and 


covariance covariance 

r = — — =- (179) 

a/ var(x ; ) x var(y ; -) s(x;) x s(y ; ) 

It is always safe to use the sample variance to estimate the 
correlation coefficient. The correlation coefficient is therefore 
a quantity derived from the covariance and is also expressed 
as the normalized covariance. 

The importance of covariance can be shown by an example: 
Suppose we have two data sets A and B and know the vari¬ 
ances of the data sets and their covariance. 

Then the variance of A + B; 

VAR (A + B) = VAR(A) + VAR(B) +2 x COVAR (A,B). 
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The covariance has a great influence. Thus, when the num¬ 
ber of samples reaches 200, the difference between using the 
population and sample covariance is about 1 %. 

It should be noted that the covariance in 
Equations (177)-(179) refers to the covariance between obser¬ 
vations (x- and y-values) not to the covariance between the 
slope and intercept of the regression function. 

The covariance between the slope (a) and the intercept ( b ) is 

cov(a, b) = — X x u(b y / x ) 2 (180) 

Correlation and Covariance Matrix 

Correlation and covariance can only be calculated between 
two variables at a time. However, if there is a number of data sets, 
it may be convenient to calculate the correlation and covariance 
between all of them. The outcome is a correlation/covariance 
matrix. The EXCEL has innate functions (Figure 11) that can be 
found under the Add-in—Data analysis, CORRELATION, 
and COVARIANCE, respectively. These commands directly 
calculate the desired matrix comprising the covariance or corre¬ 
lation between all the columns or rows, as specified. 

Note In the covariance matrix, the diagonal displays the 
variances of the data in the columns (174) and the other 
the covariance terms. 

The correlation coefficient is independent of the number of 
observations (165), whereas the covariance includes the 



Covariance 



Column 1 

Column 2 

Column 3 

Column 4 

Column 1 

0.0357 




Column 2 

0.0013 

0.0437 



Column 3 

0.0032 

0.0001 

0.0304 


Column 4 

0.0094 

-0.0050 

-0.0028 

0.0584 

Correlation 


Column 1 

Column 2 

Column 3 

Column 4 

Column 1 

1.0000 




Column 2 

0.0321 

1.0000 



Column 3 

0.0960 

0.0037 

1.0000 


Column 4 

0.2049 

-0.0991 

-0.0666 

1.0000 


FIGURE 11 Screen dumps of calculation of the covariance and correlation 
matrices in EXCEL. The diagonals of the covariance matrix are the variances 
of the column, whereas the nondiagonals are the respective covariances. 
The diagonals of the correlation matrix are 1. 
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number of observations (177). Accordingly, EXCEL offers 
two functions COVARIANCE.P(interval y„interval x t ) (179) and 
COVARIANCE.S (interval i/,, interval x,) (178) for calculating 
the covariance of a population and a sample, respectively. 
Thus, the population covariance is (n — l)/n times the sample 
covariance. The function in the data analysis operates with 
the population covariances. 


Outliers 

Results of a data set that appear to differ unreasonably from 
the rest are called outliers. It is common practice to calculate 
regressions and basic statistics with and without suspected 
outliers and evaluate the results. 

A recommended formal test for outliers is the Grubbs' test 
in which the statistic G is calculated: 

G _ suspect value - x 

s 1 

where the mean and standard deviation (s) are calculated 
including the suspect value. 

The G is then evaluated using a special table. Formula (182) 
is applicable if one value is suspected to be an outlier either 
at the upper or at the lower end of the distribution. Also 
compare (64). Other formulas are available for more complex 
situations. 

The critical value can be calculated 



(182) 


where tf pL / 2 , n - 2 ) is the critical value of the f-distribution with 
n — 2 degrees of freedom and a significance level of a/2. This 
applies to a two-sided test; for a one-sided use a /(n — 2). If 
the G is larger than the table value or that calculated from 
Equation (182), then the extreme value is unlikely to have 
occurred by chance. The table can be found in ISO 5725-2. 

The Grubbs' test should not be applied if the number of 
observations is less than 6 or more than 50. 
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The G-statistic as described above is equal to the z-score (51) 
and can be evaluated as such by using a normal cumulative 
table or the EXCEL: NORMSDIST(G). 


Example 


A set of observation had a mean of 45.8 and s(x) of 5.1. One 
observation of 58 was a suspected outlier. 


G 


= z = 


58 - 45.8 
5T 


2.4, which is more than the expected 


z = 1.96 for a 97.5 probability of not belonging to the distribu¬ 
tion. Note that 58 is above the mean and thus a one-sided eval¬ 
uation. The probability of belonging to the distribution is 
1 - NORMDIST(2A) = 0.08 (one-sided). 

Another test for outliers is the Dixon test (Q-test): 


Q 


suspect value — nearest value 
range of values 


(183) 


The critical values can be found in a special table. The Dixon 
test is usually applicable to 3-10 observations. The Grubbs' and 
Dixon's tests assume a Gaussian distribution of the quantity 
values. 

Rejection of suspect values should be made with great care. 
Even if also grossly deviating values may belong to the distri¬ 
bution in question, they may have an undue effect on calculated 
quantities, e.g., mean, standard deviation regression, and 
correlation. The effect of suspect outliers can be minimized by 
trimming or winsorizing the data set (64) and (65), respectively, 
but are of less or no importance in nonparametric calculations. 


Leverage 

Extreme values and outliers may have an influence 
(leverage) on the regression. An observation far from the cen¬ 
troid (mean of X and Y) is usually a leverage point but not nec¬ 
essarily an influence point. Influence points have an influence 
on the regression function and have a tendency to “draw the 
line closer.” Leverage points that are not influence points 
may have a profound effect on the correlation coefficient 
and the variance of the variables. The larger the leverage, 
the larger influence it will have on one or several of the prop¬ 
erties of the regression. 
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The leverage of an observation X: can be expressed 
quantitatively: 


h, = - + 


i x i~ x ) _1 | (*;~*) 

J2 n i=1 (*i - X) 2 n (n- 1)XS 


1 1 

- + - 


n n — 1 


-\ 2 

Xj - 


(184) 


where 0 < fa < 1, and n is the number of observations. 

The leverage is thus mainly influenced by the distance of the 
independent variable from the centroid. The last factor in Equa¬ 
tion (184) is equal to the squared z-score (51) of the observation. 
The larger hj, the more influence it has on the regression. If the 
observation is in line with a regression line dominated by the 
bulk of observations (characterized by a small residual), it will 
have little influence on the regression but on the variance of the 
distribution of the variables. Often an lij of 0.9 is used as a cutoff 
value. Extreme high or low quantity values may have a large 
influence but still not be regarded as “outliers” if they are found 
on or close to an otherwise defined regression line. 

Incidentally, the variance of the predicted value y includes 
part of the leverage: 


2 2 
S ij = V X 



(185) 


If hj exceeds 


hj> 


2 x p 
n 


or 


hj> 


3 x p 
n 


(186) 


where p is the number of predictors (for bivariate linear regres¬ 
sion p = 1), the point is regarded as a leverage point and needs 
special consideration. 

The number of observations in a regression analysis is 
crucial. The number of observations is directly included in 
the calculations of the slope and its uncertainty, the intercept, 
the correlation coefficient, and the leverage. 


COMPARING QUANTITIES 


Ideally, a calibration of a measurement procedure with the 
same calibrator would provide the same result when the same 












98 


FORMULAS 


quantity is measured in the same sample. For several reasons, 
this is not always the case when analyzing biological samples. 
Therefore, laboratories with many instruments for measuring 
the same quantities compare the performance of measurement 
procedures using real samples, usually patient samples. 

In clinical research, comparisons of outcomes of studied 
diagnostic procedures or treatments are important strategies, 
often with a view to find and confirm the diagnostic usefulness 
of a marker or to find surrogate markers for complex physio¬ 
logical phenomena. Both questions may be addressed and 
answered by comparison of results from measurements of 
patient samples and evaluating the results by regression and 
correlation studies. 

If the same quantities are measured in a method compari¬ 
son, it is fair to assume that the regression will be linear. If dif¬ 
ferent quantities are measured, e.g., in a calibration of a 
measurement procedure or comparison of diagnostic proce¬ 
dures, the regression may take any form and be described 
by for instance linear, logarithmic, exponential, or polynomial 
functions. It is thus logical to suspect that if the regression is 
nonlinear and the correlation very poor, the procedures mea¬ 
sure different quantities. 

Particularly in clinical experiments, the correlation may be 
poor, often due to imprecision or interfering substances, gen¬ 
erally, or at certain concentrations. As demonstrated in Equa¬ 
tion (172), high f-values and thus the significance may be 
obtained in a comparison at a given coefficient of variation 
(r), simply by increasing the number of observations. This 
prompts for great care before too long-reaching conclusion 
can be drawn from a significant r- value. 


Graphical Representation 

It is usually recommended that a scattergram (Figures 7 and 
12 (left)) is first created in a comparison. This is to give the sci¬ 
entist a broad overview of the distribution of results and spot 
possible outliers. Usually, the “equal line” and some regression 
function are also presented. 

To facilitate the evaluation of a comparison, “difference 
graphs” (Figure 12, right) are also usually constructed and 
often demanded for publication in scientific journals. The 
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Delta 


-o.i 

-0.2 


.♦/- 2 s 

Mean bias 
□ Mean 
• Median 


°V 


2 1.4 1.6 1.8 2 2.2 2.4 2.6 


Reference procedure 


FIGURE 12 Regression and difference graphs. The Deming and OLR func¬ 
tions are shown, Y=1.30X—0.30 and Y=1.09X+0.08, respectively. Equal 
variances (2,-= 1) for the methods were assumed for the Deming regression. 
Both regressions are centered on the average of the dependent and indepen¬ 
dent variables, and therefore, the regression lines cross at that point. The dif¬ 
ference between the mean and median is enhanced in the difference graph. 
Note: The scales are equal. In the difference graph the independent variable 
was chosen as reference. The X-axis represents the equal line in the scatter- 
plot and the regression function is Y=0.09X+ 0.08 (see below). The mean of 
the differences (bias) and ±2 s are shown. The correlation coefficients for the 
scatter and difference data were 0.748 and 0.019, respectively, illustrating the 
gain in “resolution” of the differences. 


difference graph displays the difference between the measure¬ 
ments plotted against the mean of the results or, if the compar¬ 
ative method (independent variable) can be regarded as a 
reference measurement procedure, against these values 
(Var 1) directly. Difference plots with the mean of the variables 
as the independent variable are known as Bland-Altman 
graphs. 

The mechanics behind the design of the difference graph 
can be understood as subtracting Y = X from the regression 
Y = bX + a, i.e., forming a new function where Y represents 
the difference and X still represents the comparative method: 

Y— Yi = b x X — Xi + a; nsr , 

Y = Xx(£> — l)+fl 1 J 

As seen from this formula, the regression function of 
the differences will have a slope which is 45° less than 
that of the original observations. In other words, the data 
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of difference graph appear tilted 45 ° clockwise in relation to 
the original regression function. Consequently, the equal line 
of the original will be represented by the X-axis and the slope 
of the regression function of the difference graph will be 1 
(tan(45°)) less than that of the original data, whereas the 
intercept (a) is unchanged (Figure 12). The correlation coeffi¬ 
cient will be decreased in comparison with that of the original 
and thus differences appear enhanced. There is no unique 
new information in the difference graph but a visual 
enhancement. 

If the differences are normally distributed, the mean differ¬ 
ence and its standard deviation can be calculated and dis¬ 
played in the difference graph (Figure 12) and compared 
with target values. 

Typical questions that are answered by the difference graph, 
in addition to mean and dispersion, are if the difference 
increases or decreases with the concentration and if the disper¬ 
sion seems to be constant or change with concentration. 

A complementary graph (Figure 13), based on the cumula¬ 
tive empirical distribution function for the differences, may be 
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FIGURE 13 The cumulative, empirical distribution function superim¬ 
posed on the difference graph. The peak of the tilted mountain coincides 
with the median of the differences. The dotted vertical and horizontal line 
corresponds to the 2.5 and 97.5 percentiles, i.e., the central 95 % of the obser¬ 
vations, centered round the median. Compare with Figure 5. 
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superimposed on the difference graph to better illustrate the 
distribution of the differences. The emperical distribution 
function will coincide with the sigmoid cumulative curve by 
a normal distribution if the number of observations is suffi¬ 
ciently large. To create an emperical distribution functions 
the differences are ranked, any ties resolved and the percen¬ 
tiles of the ranks calculated. This produces a sigmoid which 
will, however, be liable to effects of a limited number of obser¬ 
vations. The obtained distribution can be compared with what 
would be expected from a normal distribution which would be 
just another application of the Q-Q plot. 

The function is then mirrored around the 50 % percentile 
(the median) which results in a mountain-shaped curve. This 
can then be tilted 90 ° clockwise and superimposed on the dif¬ 
ference graph (Figure 13). 


PERFORMANCE CHARACTERISTICS 


Definitions 

In a dichotomous decision situation, i.e., only two alterna¬ 
tives as deciding if a person has a given disease or condition 
or not, there are four possible outcomes. These can be defined 
in a 2 x 2 frequency table in which the number of individuals 
belonging to categories (healthy and nonhealthy, respectively) 
of an independent classification is in one row. The number of 
individuals or items that are tested positive and negative, 
respectively, is reported in columns (Table 14 and Figure 14). 

Concordance between diagnostic test results and the inde¬ 
pendently found diagnosis or property are recorded as true 
positive = TP, true negative = TN, false positive = FP, and false 
negative = FN. 

The performance can be expressed as diagnostic (no- 
sographic) sensitivity, specificity, predictive value of a po¬ 
sitive result = PV(+) and predictive value of a negative 
result = PV(—): 

TP 


Diagnostic Sensitivity (Sens) 


TP + FN 


(188) 
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TABLE 14 Contingency Table or 2 x 2 Table for Classification of 
Dichotomous Results Assuming a Positive Test Result for Condition 1 and 
Negative for Condition 2 

Negative Positive 

outcome outcome 

of test of test 

TP 

Condition 1 False True Sensitivity = PV (—) = Tp+FN 

negative (FN) positive (TP) 

Condition 2 True False Specificity = PV(-) = T ™ Fp 

negative (TN) positive (FP) 

nT ,, , TN nT7Y \ TP ■ TP+TN 

“ TN+FN “ TP+FP ff tclenc y ~ TP+TN+FP + FN 



FIGURE 14 Illustration of the contingency table in Table 14. The condition 
1 (e.g., a diseased group) is above the X-axis and the condition 2 (e.g., a non- 
diseased group) below. The vertical line is the “cutoff” of a diagnostic 
marker. The frequency distributions are idealized, particularly the group 
of diseased would be skewed to the right. 


Diagnostic Specificity (Spec) = ^ — — (189) 

The sensitivity and specificity is changed by changing the 
cutoff between categories (e.g., considered healthy and non- 
healthy), reference value or decision value. In the table, this 
would imply changing the relation between the number of 
items in the columns. An increase in one of the quantities will 
invariably cause a decrease in the other (cumulative distribu¬ 
tion analysis, CDA test. Figure 16). 

TP 


PV(+) 


TP + FP 


(190) 
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PV(-) = 


Efficiency 


TN 

TN + FN 

TP+ TN 

~ TP + TN + FP + FN 


Sens x Prev + Spec x (1 — Prev) = 


Prev x (Sens — Spec) + Spec 


(191) 


(192) 


The Efficiency is thus directly proportional to the Prevalence 
of disease. 

Efficiency is also known as “Index of validity/’ “Agreement” 
or Accuracy. 

“Index of agreement” is defined as kappa (k) (237) 


Prevalence of disease ( pre-test probability; Prev ) 
TP + FN 

~ TP + TN + FP + FN 


(193) 


Note Expressions (190)-(192) depend on the prevalence of 
disease (193), whereas Equations (188) and (189) are 
characteristics of a diagnostic procedure that is used for a 
specific purpose, using defined discriminators (e.g., cutoffs). 
They may thus be regarded as constants in that context. 


Bayes’ Theorem 

The theorem describes a method to estimate the post-test 
probability by applying key characteristics of an investigation 
to the pretest probability. 

Likelihood ratio (+) (LR(+)) = Sensihint y (194) 

1 — Specificity 

Likelihood ratio (-) (LR(-)) = 1 ~ Sensitivity 

Specificity 


The LR(+) and LR(—) are also known as Bayes' factors. 


Odds 


Probability 
1 — Probability 


(196) 
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Probability ( Prob ) 


Odds 
1 + Odds 


(197) 


Pre-test probability = Prevalence of disease, see Equation (193) 


Pre-test odds 


Prevalence 
1 — Prevalence 


(198) 


Post-test odds = pre-test odds x LR 


(199) 


This relation summarizes the Bayes' theorem. 


Post-test probability [PV (+)or PV (—)] 


Post-test odds 
1 + Post-test odds 


Prob of disease if a post-test (PV(+)): 

Sens x Prev 

Sens x Prev + (1 — Spec) x (1 — Prev) 


( 200 ) 

( 201 ) 


Prob of no disease if a neg test (PV (—)): 
Spec x (1 — Prev) 

Spec x (1 — Prev) + (1 — Sens) x Prev 


( 202 ) 


Note The post-test probability for a positive and negative 
result is directly given in the 2x2 table as PV(+) (190) and 
PV(—) (191), respectively. 

The post-test probability can be estimated from the pre¬ 
valence of disease, sensitivity, and specificity as shown in 
formulas (201) and (202). A stepwise procedure would 
require to first calculate the LR(+) and LR(—), the pre-test 
odds and the posttest odds. A post-test probability is then 
obtained from Equation (200) for PV(+) and [1 — (200)] for 
PV(-). 

An online calculator for quantities related to Bayesian logics 
is available at http://araw.mede.uic.edu/cgi-bin/testcalc.pl. 

The relation between the pre-test probability and the 
post-test probability can be visually displayed in the Fagan 
nomogram (Figure 15) in which information on the pre-test 
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Pre test Post test 

probability probability 



FIGURE 15 Fagan nomogram. Draw a straight line between the pretest 
probability and the LR(+) and read the posttest probability on the right- 
hand scale. 


probability (prevalence of disease) and the likelihood ratio 
will indicate the post-test probability and thus the gain by 
performing the test. 

A LR(+) of 1 indicates that there is no gain in performing the 
test. Often an LR(+) of at least three is required in clinical 
work. 

Risk ratio = relative risk (RR): 

RR = j e - (203) 

^ne 

where 7 e is the incidence of exposed individuals and 7 ne inci¬ 
dence of not exposed individuals. 
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Receiver Operating Characteristics 

The diagnostic sensitivity (Y) plotted against (1—diagnostic 
specificity) (X) for many chosen cutoff or reference values 
is the receiver operating characteristics (ROC) curve (Figure 16). 

This is the same as plotting (1 —f) as the dependent variable 
(Y) against a as the independent variable (X). 

The ROC curve is thus a graphical representation of the 
trade-off between the false-negative and false-positive rates 
for every possible quantity value. Equivalently, the ROC curve 
is the representation of the trade-offs between sensitivity and 
specificity. By tradition, the plot shows the false-positive rate 
(a) (1 —specificity) on the X-axis and (1 — the false-negative 
rate), i.e., (1 —/?) (sensitivity) on the Y-axis. 

The ROC curve also represents the likelihood ratio (LR+) 
(194) for each tested cutoff value. 

The sensitivity and specificity concepts as defined above are 
most suited for binary or dichotomous situations, i.e., a “yes” 
or “no” answer. This limitation spills over on the ROC curve. A 
limitation of the ROC curve is its negligence of the prevalence 
of the condition. 

Youden index. 

If sensitivity and specificity are equally important, the Youden 
index [/] will indicate the performance (the higher the better) at a 
given cutoff. / is sometimes used to define an optimal cutoff (c). 




FIGURE 16 ROC and CDA curves. In the ROC curve, the quarter circle 
represents the K-index and the vertical line the /-index. The sensitivi¬ 
ty = specificity line and its perpendicular; the theoretical optimum. The ver¬ 
tical line in the CDA plot shows the sensitivity, specificity, and LR(+) at the 
chosen cutoff. 
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/ = Max c (sensitivity c + specificity c — 1) (204) 

The maximum value of the Youden index is 1 (perfect test) 
and the minimum is 0 when the test has no diagnostic value. 
The value of one [1] will be achieved when the test and com¬ 
parative populations are completely separated (see figure 14). 

The optimal outcome would be when the sensitivity is equal 
to 1, and at the same time, the specificity also equals 1, i.e., the 
false-positive rate (1 — specificity ) is zero (0). The point repre¬ 
senting this combination will be in the upper left corner of 
the graph. The closer a ROC curve is to this ideal situation, 
the better the marker performs, given that sensitivity and spec¬ 
ificity are of equal diagnostic importance. This is another way 
of expressing the Youden index (204). 

The ROC curve is a summary of the information, and as 
such, some information is lost, particularly the value of each 
cutoff. The CD A (Figure 16, right) displays the sensitivity 
and specificity against the cutoff values on the X-axis, which 
is another compromise which addresses this property. 
The CDA is thus a more useful tool to choose and describe 
the effect of a particular cutoff. It also demonstrates the influ¬ 
ence of changing the cutoff value. 


Example 

B-Glucose concentrations were measured in 200 patients 
aged 40-60 years. In this age group, the prevalence of disease 
was estimated to 6 % by an independent method. The specific¬ 
ity was estimated to 0.85 and the sensitivity 0.95. The diagram 
below (Figure 17) illustrates the possible outcome, increasing 
the pre-test probability of 6 % to a post-test probability of 29 %. 
The PV(-) is about 1 and thus the test is useful for rule-out at 
this prevalence. 


Area Under the Curve 

The area under the ROC curve ( AUC ) summarizes the per¬ 
formance. If the sum of the sensitivity and the specificity equals 
one ( TP = FP ), i.e., the area under the curve (AUC) = 0.5 and 
the ROC curve follows the diagonal, then the performance is 
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\Sensitivity: | 0.9501 | Specificity: | 0.8501 | Prevalens:] 0.06 


P-Glucose subst cone, mmol/L 



Pre odds(+): 0.06 

Efficiency: 

0.86 

Likelihood ratio(+): 6.3 

Post odds(+): 0.40 

Kappa: 

0.39 

Likelihood ratio(-): 0.06 

PV(+): 0.29 

PV(-): 

1.00 

Area under curve: 0.90 


FIGURE 17 Simultaneous display of the relation between diagnostic per¬ 
formance characteristics and the prevalence of disease. Imagine the violet 
vertical line moving horizontally and read the outcome. 


no better than chance. Compare LR(+) equal to one [1] in the 
Fagan diagram. Figure 15. 

An approximate value of AUC can be estimated by adding 
the area of trapeziums (a rectangle with a triangle on top) 
formed by connecting consecutive points. Thus, if 
(1 —specificity) of two adjacent observations is X, and X 2 and 
the corresponding sensitivity Yi and Y 2 for the points limiting 
the trapezium, then its area (Ax_ 2 ) is 


Ai—2 = Yi X (Xi - X 2 ) + 


(Y 2 - Yi) x (X! - X 2 ) 


(Xr - X 2 ) x (Yi + Y 2 ) 


(205) 


The AUC is obtained by adding the individual A(Xj — X( !+ i))- 
The more trapeziums that are identified and defined (i.e., the 
smaller the difference Xi — X 2 ), the better is the estimate. 
It should be recognized that at lower specificities, an undue 
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contribution to the area is made by the area of the trapeziums 
under the equal line. AUC calculation is offered in many soft¬ 
ware packages. 


Reference Values 

The information derived from the ROC curve can also be 
used to select a reference value or decision value depending 
on the priority given to the sensitivity or specificity in a given 
clinical setting. Since the theoretical maximal efficiency occurs 
when sensitivity = specificity = 1, i.e., the upper left corner of the 
ROC curve, the cutoff corresponding to a minimized distance d 
(K-index) between the potential reference value and the corner 
would be an optimal compromise: 

d = \J(1 - sens) 2 + (1 — spec) 2 (206) 

In a ROC curve, the d will be represented by a quarter of a 
circle since there are many solutions to Pythagoras' theorem 
with only d defined (206) (see Figure 16). 

A high specificity will tend to rule out disease in a decision 
situation and a high sensitivity will rule in, but the outcome is 
also influenced by the prevalence of disease (see Figure 17). 

A high sensitivity (rule in) is often preferred for screening 
purposes, if followed by a procedure with high specificity. 


ESTIMATION OF MINIMAL SAMPLE SIZE 
(POWER ANALYSIS) 


Error Types 

To determine the power of a test and the minimal sample 
size, one has to consider the null hypothesis, the probability 
of rejecting false-positive results (a), the probability of not 
rejecting false-negative results (//), and the dispersion of the 
sample results and the assumed difference between results 
and whether a one- or two-tailed statistical analysis is planned. 

Thus, the null hypothesis can either be true or false and we can 
make two types of error; if rejected when it is true, i.e., the type I or 
a-error and if not rejected when false, i.e., the type II or //-error. 
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The type II error, /i-error, is when the difference overlaps 
with the values that would not be regarded as a difference. 
This can only be one-sided. 

The p-value of a significance test equals the probability that 
a result occurs, i.e., more extreme than those when the null 
hypothesis is true. For instance, if the null hypothesis is that 
two values are the same, a high p-value will be supportive, 
whereas a small (usually p<0.05) will not and the values 
are regarded as different with a 95 % probability (two-sided). 
And there is a 5 % probability (a = 0.05) that we make a mistake 
in this judgment. 

Consider two populations which averages differ by d and 
which overlap to a certain degree. The null hypothesis states 
that there is no difference between the populations, i.e., 
d = 0. However, a tail of one of the distributions (a) coincides 
with the other population and a tail of this ([>) coincides with 
the first. The tails are defined by a “cutoff.” Therefore, if one 
increases, the other will decrease and the analogy with the 
diagnostic specificity (1 —a) and diagnostic sensitivity (1 — jS) 
is obvious (see above): 


(207) 


Specificity = 1 — a 
and 

Sensitivity = 1 — ft 


(208) 


where a is the probability of rejecting a false positive and [1 is 
the probability of not rejecting a false positive, respectively. 
That is, a is the false-positive rate and ft is false-negative rate, 
usually linked to discussions on the probability of identifying 
differences (see below). 

Power of a Test 

Power is defined as the probability that a statistical test will 
reject the null hypothesis when it is false. Therefore, power is 
equal to 1 — f or sensitivity. A commonly accepted 
power = 0.80, i.e., = 0.20. 

a = "probability of falsely accepting the alternate hypothesis," 
i.e., rejecting the null hypothesis when true (209) 
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P = "probability of falsely accepting a null hypothesis," 
i.e., not rejecting the null hypothesis when it is false (210) 


1 — fi = "power 


( 211 ) 


The acceptable size of the ratio /i/a is conditional to the pur¬ 
pose of the power analysis; if false positives are critical, 
increase the ratio, e.g., by decreasing the a, if false negatives 
are more important, reduce the ratio. An acceptable ratio in 

clinical practice is often set to q'q^ = 4. 

Sample Size 




(213) 


The same is valid for proportions (p). If the proportion is not 
known, use p = 0.5 that has the highest standard deviation of 
all proportions. 

There are other rules to estimate the sample size. If we want 
to estimate the sample size from the acceptable standard error 
of the mean s(x), this formula can be used: 



(214) 


Example 

Suppose the method uncertainty (u(x)) is 20 and the 
acceptable standard error of the mean s(x) = 10, then a reason¬ 
able number of observations is 16 with 95 % confidence 


(a = 0.05; z = 2) n 



2 


16. 
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It is argued slightly different to estimate the necessary num¬ 
ber of observations to identify a difference between two results 
(xi and x 2 ). 


Example 

Assume the same number of observations in the two groups 
(n) and the same variance (s(x)) 2 of the results. This leads 
toward a Student's independent f-test. If the difference of 
the results of measurements is expressed in standard devia¬ 
tions (cf. z-score (51)), the expression is 


d = 


xi - x 2 


(215) 


The necessary number of observations in each sample is 

2 X (z ( i_ a/2 ) + Z(1 _ /S )) 2 _ 2 X s(x) 2 X (z ( i_ a/2 ) + Z(1 _ /J )) 2 _ 


n = 


Xi-X 2 


x S M / 2 

2 X (zp.^+zp^)) 


(xi - x 2 f 


(216) 


d 2 


If a = 0.05 and /i = 0.2, i.e., the /i/a = 4 (see above), Z(i_ a / 2 ) 
and z (1 _ w are 1.96 (in EXCEL: NORMSINV(l - a/2)) and 0.84 
(NORMSINV(l — [!)), respectively. The numerator will be 
15.7 which is rounded to 16 and a simplified formula will be 

16 (s(x)) 2 x 16 16 16 

( {x 1 -x 2 ) \ 2 (x 1 -x 2 ) z d 2 (z-score) 2 

V S M ) 

where X\ and x 2 are the sample means, s(x) is the standard 
deviation of both samples, and n the number of observations 
in each sample. The factor 16 refers to a two-sample situation; 
in a one-way the factor will be 8. 

The detectable difference can be calculated from Equa¬ 
tion (214), if inverted: 
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4 

d 2 = —; d = — (218) 

n y/n 

This is the difference that can be detected with a 95 % con¬ 
fidence for a given sample size. 

If there are only two observations the MD will be 

d^ — ^ 2.83 = — - * 2 ; xi - * 2 = 2.83 x s* (219) 

v 2 

Compare the combined uncertainty of the difference 
between two observations with the same uncertainty 
u{A) = u(C) = Z!_ a / 2 x U {A) x y/2 « 2.77 x u(a ) (cf. MD (80)). 
This should be exceeded to indicate a significant difference 
between the results. The factor 4 is an approximation and 
represents 2 x (1.96+ 0.84) = 15.68, changing the factor in 
Equation (219) to 2.77. 


Sample Size If Given the %CV 

If the %CV is the same for both methods, the relative 
difference is first estimated as 


RD 


*1 ~*2 

(*1 + X 2 ) 


2 


and the number of samples in each group 


n 


16 x CV 2 
RD 2 


( 220 ) 


( 221 ) 


Example 

If the relative difference to be detected is, e.g., 20 % and the 
relative uncertainty (coefficient of variation: %CV) 30 %, the 
following is obtained: 

16 x %CV 2 1 6 x 0.3 2 1.44 _ 

Yi ~-=-=-~ ?9 

RD 2 ( i-Q-8 \ 2 0.049 

{ (1+0-8) \ 


2 
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FORMULAS 


If the relevant difference to be detected is 10 %, the number 
of observations needs to be about (130). If the comparison is 
with a standard, i.e., only one group is needed, then the factor 
in the nominator is 8. 

If the averages are known (x 1 and x 2 ) and the %CV speci¬ 
fied, then Equation (221) can be written as 


16 x ( CV ) 2 
(ln(x i) - ln(x 2 )) 2 


( 222 ) 


Number Needed to Treat or Harm 

Define EE and CE as the number of Events in the Experi¬ 
mental and Control groups, respectively, and EN and CN 
the number of nonevents in the Experimental and Control 
groups, respectively. Let ES and CS be the number of subjects 
in the Experimental group and Control groups, respectively. 
The EER is the Experimental Event Rate and CER as Control 
Event Rate; then 


ES = EE + EN; CS = CE + CN; 


Relative risk (risk ratio cf. 203): 


EER = ^ and CER 


_ CN 
~~CS 
(223) 


RR = 
and 


EER 

CER 


(224) 


Experimental event odds: 


EEO 


EE 

EN 


(225) 


Control event odds: 


CE 

CN 


CEO 


(226) 





AGREEMENT BETWEEN CATEGORICAL ASSESSMENTS 


115 


Odds ratio: 

EE 

EEO _ eiv _ EE x CN _TP x TN 

~ CEO ~~CE~CExEN~FNxFP 
CN 

Efficacy of treatment = 1 — RR 


(227) 

(228) 


Number needed to treat or harm: 


EER - CER v ' 

if (229) < 0 then NNT (Number Needed to Treat); 
if (229) > 0 then NNH (Number Needed to Harm) 

The larger the absolute value of the NNT, the less efficient is 
the treatment. 


AGREEMENT BETWEEN CATEGORICAL 
ASSESSMENTS (KAPPA («•)-STATISTICS) 


This problem is faced when two measurement procedures 
which report results on an ordinal scale are compared and 
the number of agreeing results can be organized in a cross¬ 
table (contingency table, frequency table). It is also used when 
observers are categorizing patients or events into two or sev¬ 
eral groups, e.g., two or more experts evaluating test results. 
A special case is the 2x2 frequency table where only two 
groups are considered (see below). 

Any number of categories (groups) can be studied. The table 
is characterized by the same number of rows and columns. 

Example 

Enter the number of observations in each group for each 
observer in the appropriate cells (Table 15): 

Proportion of agreement by chance (expected): 

P c = (11 x 21 + 12 x 22 + 13 x 23 + 14 x 24)/(N) 2 (230) 

Proportion of observed agreement: 

Po = (A + F + K + Q)/N 


(231) 
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TABLE 15 Notation in the Example 



Method (Observer) 1 


Method 
(observer) 2 


Group 11 

Group 12 

Group 13 

Group 14 

Total 

Group 21 

A 

B 

C 

D 

21 

Group 22 

E 

F 

G 

H 

22 

Group 23 

I 

J 

K 

L 

23 

Group 24 

M 

O 

P 

Q 

24 


Total 

11 

12 

13 

14 

Total (N) 


Kappa (k) 


Pq - Pc 1 - Po 

1 -Pc 1 - Pc 


(232) 


Note The K-value is influenced by the bias between the 
observers defined as the difference between Method 
(Observer) A and Method (Observer) B in their assessment of 
the frequency of occurrence of a condition. A high value of the 
K-value indicates a high degree of concordance between 
observers. 


In a 2 x 2 table, the bias can be quantified as the Bias IndexBI 
(symbols in the example are retained, assuming that all input 
cells except A, B, E, and F are zero [0]): 


A+B A+E_B-E 
~N N~ ~ N 


(233) 


thus reflecting the difference between cells of disagreement B 
and E. 

BI results can take values between (—l)and (+1). 

Kappa (k) can be corrected for the bias BAK: 

1 - Po 

BAK = 1-^ y (234) 

(ll+21) 2 +(12+22) 2 

1 4 xN 2 

The value of k is also affected by the relative probabilities of 
the “Yes” and “No” answers (in a 2 x 2 table). This is called 
Prevalence IndexPl: 
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PI = 


A-F 

N 


(235) 


and is the difference between cells of agreement A and F. PI 
ranges from (—1) to (+1). 

A kappa (k) value that is both prevalence and bias adjusted is 
PABAK: 


PABAK = 2xP 0 -l 


(236) 


PABAK ranges from (—1) to (+1) like kappa(K), but the inter¬ 
pretation may be different from that of the uncorrected kappa 
(k) value. 

The value of k is depending on all these indexes: 


PABAK - PI 1 2 + B I 2 

1 - PI 2 + BI 2 


(237) 


Clearly, if either of BI or PI takes on extreme values, the 
interpretation of the K-value is difficult. 

k can take any value between —1 and +1, k = 1 is total agree¬ 
ment, k = 0 (agreement expected to chance); k < 0 indicates less 
than expected (rare), total disagreement at k = (—1): 


se(p) = ± 


To x (1 - P 0 ) 
n x (1 - P c ) 2 


(238) 


Agreement in a 2 X 2 Table 

The efficiency of a diagnostic test is described in a 2 x 2 table 
by the sensitivity (188) and specificity (189) of the test. The effi¬ 
ciency is the sum of true results relative to all observations 
(192). Considering the influence of chance on the efficiency 
gives an expected “efficiency”: 

_ (TP + FN ) x (TP + FP ) + (TN + FN) x (TN + FP ) (239) 


where N is the total number of observations. 
The index of agreement, i.e., kappa is 
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TABLE 16 Evaluation of K-Values 


Kappa (k) 

Agreement 

0.00 

Poor 

0.01-0.20 

Slight 

0.21-0.40 

Fair 

0.41-0.60 

Moderate 

0.61-0.80 

Substantial 

0.81-1.00 

Almost perfect 


The agreement between test and diagnosis. 


efficiency — p e 
1 ~Ve 


(240) 


k is interpreted as the difference between the found effi¬ 
ciency and the expected relative to that possible, considering 
the chance. 

k has been calculated and included in Figure 17. It is note¬ 
worthy how chance decreases the efficiency of a test at high 
and low prevalence of disease. This is particularly important 
in validating a diagnostic marker, i.e., evaluating it being fit 
for purpose (Table 16). 

The approximate standard error of k is 


se(/c) 


Efficiency x (1 — Efficiency ) 


n x (1 — p e y 


(241) 


Example 

Compare two hypothetical 2x2 tables, A and B 


A 

Test 

B 

Test 

Positive 

Negative 

Sum 

Positive 

Negative 

Sum 

Diseased 

10 

5 

15 

Diseased 

20 

5 

25 

Healthy 

5 

80 

85 

Healthy 

5 

70 

75 

Sum 

15 

85 

100 

Sum 

25 

75 

100 
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Sensitivity 

0.67 

Sensitivity 

0.8 

Specificity 

0.94 

Specificity 

0.93 

Efficiency 

0.9 

Efficiency 

0.9 

Expected 

efficiency 

0.75 

Expected 

efficiency 

0.63 

Kappa (k) 

0.61 

Kappa (k) 

0.73 


Although the efficiency is the same, the /v-value indicates 
a better agreement between test and diagnosis in the 
B example, which is also demonstrated in the sensitivity, 
this time. 











Some Metrological Concepts 


METROLOGY, ACCURACY, TRUENESS, 
AND PRECISION 


quantity 

property of a phenomenon body, or substance, where the 
property has a magnitude that can be expressed as a number 
and a reference. 

Note 1 A reference can be a measurement unit, a 
measurement procedure, a reference material, or a 
combination of such. 

Note 2 The preferred IUPAC-IFCC format for designations 
of quantities in laboratory medicine is “System—Component; 
kind of quantity.” 


kind of quantity 

aspect common to mutually comparable quantities. 
Note 1 The division of “quantity” according to “kind of 
quantity” is to some extent arbitrary. 


^Extracted from JCGM 200 2012 (VIM 2012): International Vocabulary of 
Metrology—Basic and General Concepts and Associated Terms (VIM), 3rd 
edition, 2008 version with minor corrections. 

This important source of defined terminology is freely downloadable from 
http:// www.bipm.org/utils/common/documents/jcgm/JCGM_200_2012.pdf. 

The terms we preset here are not necessarily presented in the alphabetical 
order. The choice of terms is limited and the reader is advised to consult with the 
original document. Notes and examples may not be cited in extenso. 
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SOME METROLOGICAL CONCEPTS 


Example 1 

The quantities of diameter, circumference, and wavelength 
are generally considered to be quantities of the same kind, 
namely of the kind of quantity called length. 

Example 2 

The quantities heat, kinetic energy, and potential energy are 
generally considered to be quantities of the same kind, namely 
of the kind of quantity called energy. 

Note 2 Quantities of the same kind within a given system 
of quantities have the same quantity dimension. However, 
quantities of the same dimension are not necessarily of the same 
kind. 

quantity value 

value of a quantity, value. 

number and reference, together expressing magnitude of a 
quantity. 

Example 1 Length of a given rod: 5.34 m or 534 cm. 

Example 2 Mass of a given body: 0.152 kg or 152 g. 

Example 3 Celsius temperature of a given sample: —5 °C. 

Example 4 Molality of Pb 2+ in a given sample of water: 

1.76 pmol/kg. 

Example 5 Arbitrary amount-of-substance concentration of 
lutropin in a given sample of human blood plasma (WHO 
International Standard 80/552 used as a calibrator): 5.0 IU/ 
L, where “IU” stands for “WHO International Unit” 

Note 1 According to the type of reference, a quantity value 
is either a product of a number or a measurement unit (see 
Examples 1-4). 

The measurement unit one is generally not indicated for 
quantities of dimension one or 

a number and a reference to a measurement procedure or 
a number and a reference material (see Example 5). 

measurement 

process of experimentally obtaining one or more quantity 
values that can reasonably be attributed to a quantity. 

Note 1 Measurement does not apply to nominal properties. 

Note 2 Measurement implies comparison of quantities and 
includes counting of entities. 
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Note 3 Measurement presupposes a description of the 
quantity commensurate with the intended use of a 
measurement result, a measurement procedure, and a 
calibrated measuring system operating according to the 
specified measurement procedure, including the 
measurement conditions. 

measuring system 

set of one or more measuring instruments and often other 
devices, including any reagent and supply, assembled and 
adapted to give information used to generate measured quantity 
values within specified intervals for quantities of specified kinds. 

Note 

A measuring system may consist of only one measuring 
instrument. 

measuring instrument 

device used for making measurements, alone or in conjunc¬ 
tion with one or more supplementary devices. 

Note 1 A measuring instrument that can be used alone is a 
measuring system. 

Note 2 A measuring instrument may be an indicating 
measuring instrument or a material measure. 

measurand 

quantity intended to be measured. 

Note 1 The specification of a measurand requires knowledge 
of the kind of quantity, substance carrying the quantity, 
including any relevant component, and the chemical entities 
involved. 

Note 2 The measurement, including the measuring system 
and the conditions under which the measurement is carried 
out, might change the phenomenon, body, or substance such 
that the quantity being measured may differ from the 
measurand as defined. 

Example 1 

The length of a steel rod in equilibrium with the ambient 
Celsius temperature of 23 °C will be different from the length 
at the specified temperature of 20 °C, which is the measurand. 
In this case, a correction is necessary. 
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Note 1 In chemistry, “analyte,” or the name of a substance or 
compound, is a term sometimes used for “measurand.” 

This usage is erroneous because this term does not refer to 
quantity. 

Note 2 The measurand is a quantity, e.g., “glucose 
concentration in serum” where glucose is the analyte or 
component and serum is the system or matrix. 


ordinal quantity 

quantity, defined by a conventional measurementproce- 
dure, for which a total ordering relation can be established, 
according to magnitude, with other quantities of the same 
kind, but for which no algebraic operations among those quan¬ 
tities exist. 

Example 1 Octane number for petroleum fuel. 

Example 2 Subjective level of abdominal pain on a scale 
from 0 to 5. 

Note 1 Ordinal quantities can enter into empirical relations 
only and have neither measurement units nor quantity 
dimensions. Differences and ratios of ordinal quantities have 
no physical meaning. 

Note 2 Ordinal quantities are arranged according to ordinal 
quantity-value scales. 


accuracy of measurement 
measurement accuracy. 

accuracy closeness of agreement between a measured quan¬ 
tity value and a true quantity value of a measurand. 

Note 1 The concept “measurement accuracy” is not a 
quantity and is not given a numerical quantity value. A 
measurement is said to be more accurate when it offers a 
smaller measurement error. 

Note 2 The term “measurement accuracy” should not be 
used for measurement trueness, and the term “measurement 
precision” should not be used for “measurement accuracy,” 
which, however, is related to both these concepts. 

Note 3 “Measurement accuracy” is sometimes understood as 
closeness of agreement between measured quantity values 
that are being attributed to the measurand. 
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trueness 

measurement trueness. 

trueness of measurement. 

closeness of agreement between the average of an infinite 
number of replicate measured quantity values and a reference 
quantity value. 

Note 1 Measurement trueness is not a quantity and thus 
cannot be expressed numerically, but measures for closeness 
of agreement are given in ISO 5725. 

Note 2 Measurement trueness is inversely related to 
systematic measurement error, but is not related to random 
measurement error. 

Note 3 Measurement accuracy should not be used for 
“measurement trueness” and vice versa. 

bias 

measurement bias. 

estimate of a systematic measurement error. 

precision 

measurement precision. 

closeness of agreement between indications or measured 
quantity values obtained by replicate measurements on the 
same or similar objects under specified conditions. 

Note 1 Measurement precision is usually expressed 
numerically by measures of imprecision, such as standard 
deviation, variance, or coefficient of variation under the 
specified conditions of measurement. 

Note 2 The “specified conditions” can be, for example, 
repeatability conditions of measurement, intermediate 
precision conditions of measurement, or reproducibility 
conditions of measurement (see ISO 5725-3:1994). 

Note 3 Measurement precision is used to define 
measurement repeatability, intermediate measurement 
precision, and measurement reproducibility. 

Note 4 Sometimes “measurement precision” is erroneously 
used to mean measurement accuracy. 

intermediate measurement precision 

intermediate precision. 
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measurement precision under a set of intermediate preci¬ 
sion conditions of measurement. 

intermediate precision condition of measurement 

intermediate precision condition. 

condition of measurement, out of a set of conditions that 
includes the same measurement procedure, same location, 
and replicate measurements on the same or similar objects 
over an extended period of time, but may include other condi¬ 
tions involving changes. 

Note 1 The changes can include new calibrations, calibrators, 
operators, and measuring systems. 

Note 2 A specification for the conditions should contain the 
conditions changed and unchanged, to the extent practical. 

Note 3 In chemistry, the term “interserial precision condition 
of measurement” or “between series imprecision” is 
sometimes used to designate this concept. 

repeatability 

measurement repeatability. 

measurement precision under a set of repeatability condi¬ 
tions of measurement. 

repeatability condition of measurement 

repeatability condition. 

condition of measurement, out of a set of conditions that 
includes the same measurement procedure, same operators, 
same measuring system, same operating conditions and same 
location, and replicate measurements on the same or similar 
objects over a short period of time. 

Note 1 A condition of measurement is a repeatability 
condition only with respect to a specified set of repeatability 
conditions. 

Note 2 In chemistry, the term “intraserial precision condition 
of measurement” or “within series imprecision” is sometimes 
used to designate this concept. 

reproducibility 

measurement reproducibility. 

measurement precision under reproducibility conditions of 
measurement. 
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reproducibility condition of measurement 
reproducibility condition. 

condition of measurement, out of a set of conditions that 
includes different locations, operators, measuring systems, 
and replicate measurements on the same or similar objects. 

Note 1 The different measuring systems may use different 
measurement procedures. 

Notes 2 A specification should give the conditions changed 
and unchanged, to the extent practical. 

UNCERTAINTY CONCEPT AND 
UNCERTAINTY BUDGET 


uncertainty 

uncertainty of measurement. 

measurement uncertainty. 

nonnegative parameter characterizing the dispersion of the 
quantity values being attributed to a measurand, based on the 
information used. 

Note 1 Measurement uncertainty includes components 
arising from systematic effects, such as components associated 
with corrections and the assigned quantity values of 
measurement standards, as well as the definitional 
uncertainty. Sometimes estimated systematic effects are not 
corrected for but, instead, associated measurement 
uncertainty components are incorporated. 

Note 2 The parameter may be, for example, a standard 
deviation called standard measurement uncertainty (or a 
specified multiple of it), or the half-width of an interval, 
having a stated coverage probability. 

Note 3 Measurement uncertainty comprises, in general, 
many components. Some of these may be evaluated by Type A 
evaluation of measurement uncertainty from the statistical 
distribution of the quantity values from series of 
measurements and can be characterized by standard 
deviations. The other components, which may be evaluated by 
Type B evaluation of measurement uncertainty, can also be 
characterized by standard deviations, evaluated from 
probability density functions based on experience or other 
information. 
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uncertainty budget 

statement of a measurement uncertainty, of the components 
of that measurement uncertainty, and of their calculation and 
combination. 

Notes 

An uncertainty budget should include the measurement 
model, estimates, and measurement uncertainties associated 
with the quantities in the measurement model, covariance, 
type of applied probability density functions, degrees of free¬ 
dom, type of evaluation of measurement uncertainty, and any 
coverage factor. 

input quantity in a measurement model 
input quantity. 

quantity that must be measured, or a quantity, the value of 
which can be otherwise obtained, in order to calculate a mea¬ 
sured quantity value of a measurand. 

influence quantity 

quantity that, in a direct measurement, does not affect the 
quantity that is actually measured, but affects the relation 
between the indication and the measurement result. 

Note 

An indirect measurement involves a combination of direct 
measurements, each of which may be affected by influence 
quantities. 

Example 

Amount-of-substance concentration of bilirubin in a direct 
measurement of hemoglobin amount-of-substance concentra¬ 
tion in human blood plasma. 

output quantity in a measurement model 
output quantity. 

quantity, the measured value of which is calculated using 
the values of input quantities in a measurement model. 

expanded measurement uncertainty 
expanded uncertainty. 

product of a combined standard measurement uncertainty 
and a factor larger than the number one. 
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Note 

The factor depends upon the type of probability distribution 
of the output quantity in a measurement model and on 
the selected coverage probability. 


coverage interval 

interval containing the set of true quantity values of a mea- 
surand with a stated probability, based on the information 
available. 

Note 1A coverage interval should not be termed “confidence 
interval” to avoid confusion with the statistical concept. 

Note 2 A coverage interval can be derived from an expanded 
measurement uncertainty. 


coverage probability 

probability that the set of true quantity values of a measur- 
and is contained within a specified coverage interval. 

Note 

The coverage probability is also termed “level of confi¬ 
dence” in the GUM. 


coverage factor 

number larger than 1 by which a combined standard mea¬ 
surement uncertainty is multiplied to obtain an expanded 
measurement uncertainty. 

Note 

A coverage factor is symbolized k. 
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sensitivity 

sensitivity of a measuring system. 

quotient of the change in an indication of a measuring sys¬ 
tem and the corresponding change in a value of a quantity 
being measured. 

Note 1 Sensitivity of a measuring system can depend on the 
value of the quantity being measured. 
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Note 2 The change considered in a value of a quantity being 
measured must be large compared with the resolution. 

resolution 

smallest change in a quantity being measured that causes a 
perceptible change in the corresponding indication. 

Note 

Resolution can depend on, for example, noise (internal 
or external) or friction. It may also depend on the value of 
a quantity being measured. 


detection limit 
limit of detection. 

measured quantity value, obtained by a given measurement 
procedure, for which the probability of falsely claiming the 
absence of a component in a material is /i, given a probability 
a of falsely claiming its presence. 

Note 1IUPAC recommends default values for a and j> 
equal to 0.05. 

Note 2 The abbreviation LOD is sometimes used. 

Note 3 The term “sensitivity” is discouraged for this concept. 


verification 

provision of objective evidence that a given item fulfils spec¬ 
ified requirements. 

Example 1 

Confirmation that performance properties or legal require¬ 
ments of a measuring system are achieved. 

Example 2 

Confirmation that a target measurement uncertainty can 
be met. 

Note 1 When applicable, measurement uncertainty should 
be taken into consideration. 

Note 2 The item may be, e.g., a process, measurement 
procedure, material, compound, or measuring system. 

Note 3 Verification should not be confused with calibration. 
Not any verification is a validation. 



MISCELLANEA 


131 


validation 

verification, where the specified requirements are adequate 
for an intended use. 

Example 

A measurement procedure, ordinarily used for the mea¬ 
surement of mass concentration of nitrogen in water, may be 
validated also for measurement in human serum. 


interval 

indication interval. 

set of quantity values bounded by extreme possible 
indications. 

Note 

An indication interval is usually stated in terms of its smal¬ 
lest and greatest quantity values, for example “15-25 mL.” 

range of a nominal indication interval 
absolute value of the difference between the extreme quan¬ 
tity values of a nominal indication interval. 

Example 

For a nominal indication interval of 15-25 mL, the range of 
the nominal indication interval is 10 mL. 



Further Reading 


There is an abundance of statistical text related to measure¬ 
ments in laboratories. Some textbooks that have appealed par¬ 
ticularly to the author are listed below. 

Several Internet sites provide useful information but are not 
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Dybkaer R. An Ontology of Property for Physical, Chemical and Biological 
Systems, http://ontology.iupac.org. Accessed 2013-08-25 
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made since, http://www.itl.nist.gov/div898/handbook/. Accessed 
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ISO 5725-2: Accuracy (Trueness and Precision) of measurement methods 
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Coverage interval, 129 
Coverage probability, 129 
Cumulative distribution analysis, 107 

D 

Data transformation, 11 
Decision value, 109 
Deming orthogonal regression, 77 
Dependent variable, 71 
Derivative, 3 
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Derivative, partial, 3 
Detection limit, 130 
Deviation, absolute, 35-36 
Deviation, average, 35-36 
Diagnostic sensitivity, 101 
Difference, minimum, 39, 113 
Difference, relative, 113 
Dixon test, 96 

Drawing normal distribution, 17 

E 

Efficiency, 103 

Empirical distribution function, 100-101 
Error propagation, 27, 36-37 
Error term, 84-85 
Eurachem Guide, 36-37 
Even ordered numbers, 29 
Expanded uncertainty, 38, 128 
Experimental event odds, 114 
Exponents, 1-2 

F 

Falsely accepting, 110-111 

Falsely accepting null hypothesis, 110-111 

Falsely claiming absence, 130 

Falsely claiming presence, 130 

FINV(prob,dfl,df2), 65 

Fisher's exact test, 45 

Fisher's transformation, 91 

Fit-for-purpose, 118 

Frequency table, 101, 115 

Friedman's test, 53-54 

F-test, 51, 64-65 

G 

GAMMALN, 19-20 

Gauss (normal) distribution, 6 

Goodness of fit, 43 f 

Grand mean, 49 

Grand mean, weighted, 50 

Group mean, 49 

Grubbs' test, 95 

H 

Harmonic average number, 55 
Histogram, 5-6 
Hypotenuse, 3-4 

I 

Independent variable, 71 
Index of agreement, 117-118 


Index of individuality, 39 

Index of validity, 103 

Individuality index, 39^10 

Inflexion point, 3 

Influence point, 96 

Influence quantity, 128 

Input quantity, 128 

Intercept, uncertainty DLR, 80 

Intercept, uncertainty ORL, 74 

Intermediary precision, 54 

Intermediate measurement precision, 125 

Intermediate precision condition, 126 

Interquartile interval (IQR), 30 

Interval, 131 

Interval scale, 5 

IQR. See Interquartile interval (IQR) 

IU, 122 

J 

/-index, 106-107 

K 

k, coverage factor, 38 
K-index, 109 
Kind of quantity, 121 
Kolgomorov-Smirnov, 31 
Kruskal-Wallis, 53-54 
F-statistics, 115 
Kurtosis, 10 

L 

Least square regression, 72 
Least squares, 75 
Leptokurtic, 10 
Level of confidence, 38, 129 
Leverage, 96 

Likelihood ratio, 103, 106 
Limit of detection, 130 
Linearity, 84 
Linear lesat square, 72 
LOD, 130 
Logarithm, 1-2 
Logit, 32 
LR, 103 

M 

MAD, 35 

Manhattan median, 57 
Mann-Whitney IT-test, 62, 68-69 
Margin of error. 111 
Marker validation, 118 
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Matrix, 124 

MD. See Minimum difference (MD) 
Mean 

arithmetic, 13 
geometric, 13 
harmonic, 14 
quadratic, 20 
trimmed, 33-34 
weighted, 14 
winsorized, 34 

Mean absolute deviation, 35-36 
Mean squared error (MSE), 34-35 
Mean squares, 54 
Measurand, 127 
Measurand, definition, 123 
Measurement accuracy, 124 
Measurement, definition, 122-123 
Measurement precision, 125 
Measurement reproducibility, 126 
Measurement trueness, 125 
Measurement uncertainty, 127 
Measuring instrument, 123 
Measuring system, 123 
Median, 28 

Median absolute deviation, 35 
Median, Manhattan, 57 
Minimum difference (MD), 39,113 
Mode, 29 
MOM, 35 

Moment, second and third, 8 
MSE. See Mean squared error (MSE) 
MU. See uncertainty 
Multimodal, 29 
Multiple of median, 35 

N 

NNH. See Number needed to harm (NNH) 
NNT. See Number needed to treat (NNT) 
Nominal scale, 5 
Normality, 31 
Normality test, 31 
NORMDIST(z), 17 
NORMSDIST(z), 27-28, 96 
NORMSINV(probability), 17, 32, 112 
Number needed to harm (NNH), 115 
Number needed to treat (NNT), 115 
Number of events, 28 

o 

Odd ordered numbers, 28 
Odds 


control event, 114 
definition, 103 
experimental event, 114 
Odds ratio, 114 

OLR. See Ordinary linear regression 
(OLR) 

Ordered numbers, 28 
Ordinal quantity, 124 
Ordinal scale, 5 

Ordinary linear regression (OLR), 72 
Outlier, 35, 95 
Output quantity, 128 

P 

PABAK, 117 

Paired samples, observations, 60, 66 
Passing-Bablok regression, 82-83 
Peakedness, 10 

Pearson correlation coefficient, 85-86 
Pearson skewness, 9 
Percentile, 28, 29 
Performance characteristics, 101 
Person skewness index, 10 
Platykurtic, 10 
Poisson distribution, 28 
Post-test odds, 104 
Post-test probability, 104 
Power of a test, 110-111 
Precision comparison, 47 
Precision, definition, 125 
Predicted value, variance, 97 
Pre-test odds, 104 
Prevalence Index, 116-117 
Prevalence of disease, 103 
Probability, 103 
Probability density function, 6 
Probit function, 31 
Propagation, 36-37 
Property, nominal, 122 
Proportional, 104-105 
Proportion of agreement, 115-116 
Proportion of successes, 25 
Proportion, standard error, 25 
Pure between run, 54 
Pythagoras' theorem, 3^4,109 

Q 

Q-Q plot, 100-101 
Q-test, 96 
Quantile, 30-31 
Quantile-Quantile, 31 
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Quantity, 121 
comparison of, 98 
ordinal, 124 
value, 122 

Quartile skewness, 8-9 

R 

Radian, 4 

Random numbers, 33 
Range, 131 
RANK function, 66 
Ratio scale, 5 

RCV. See Reference change value (RCV) 
Recalibration, 81 

Receiver operating characteristics (ROC), 

106 

Rectangular distribution, 10, 40 
Reference change value (RCV), 39 
Reference value, 109 
Regression and correlation, 71-85 
Regression coefficient, 72 
Regression function, 72 
Relative difference, 113 
Relative risk, 114 

Relative standard deviation (RSD), 21, 
23-24 

Repeatability, 126 
Repeatability condition, 127 
Reproducibility, 126 
Reproducibility condition, 127 
Residual, 73, 84-85 
Resolution, 130 
Resolve ties, 90 
RiLiBAEK, 21 
Risk ratio, 114 
Robust estimators, 28M8 
ROC. See Receiver operating 
characteristics (ROC) 

Root mean square (RMS), 20, 21 
RSD. See Relative standard deviation 
(RSD) 

r, significance of, 91 
Rule of three, 48 
Rule of thumb, 39 

s 

Sample size. 111 

Second skewness, Pearson, 9 

SEM, 24 

Sensitivity, diagnostic, 101, 102-103 
Sensitivity of a measuring system, 129 


SE proportion, Wilson method, 26 
Significance of r, 91 
Sign test, 60 
SIMA), 4 

Single sample f-test, 61 
Sinus wave, 20 
Skewness, 8 

Skewness index, Pearson, 10 
Slope, 74-75 
Slope of 'X on Y', 76-77 
Slope, uncertainty DLR, 80 
Slope, uncertainty OLR, 74 
Spearman's rank correlation, 89-91 
Specificity, diagnostic, 102-103 
Standard ANOVA, 49f, 56f 
Standard deviation 
based on MAD, 35 
Cl of, 18 
Dahlberg, 23 
duplicates, 23 
geometric, 24 
Poisson distribution, 28 
pooled, 22 
population, 20 
relative, 21 
relative, percent, 21 
residuals, 74, 84-85 
sample, 15 
short cut, 17 

Standard error, correlation coefficient, 91 
Standard error of proportion, 25 
Standard error of the mean, 24 
Standard uncertainty 
Gaussian distribution, 42 
rectangular distribution, 40 
triangular distribution, 41 
Student independent, 60 
Student's t-test, 59-65 
Sum of squares, 16, 49 
SUMSQ, 16 
System, 124 

T 

TAN(A), 4 
f-distribution, 10 
Test statistic, 69 
Ties, 67 

Traceability, 74 
Transformation, Fisher's, 91 
Transformation of data, 11 
Triangular distribution, 41 
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Trigonometric functions, 3-5 
Trimmed mean, 33 
Trueness, definition, 125 
Tukey's quick test, 62 
Two-point calibration, 81 
Type A estimates, 40 
Type B estimates, 40 
Type I error, 109 
Type II error, 110 

u 

Unbalanced ANOVA, 49 
Unbiased estimate between group 
variance, 55 

Unbiased estimator, 16-17 
Uncertainty budget, 38, 128 
Uncertainty, difference between 
proportions, 27 

Uncertainty of measurement, 127 
Uncertainty propagation, 36-38 
Uniform distribution, 40 
Unimodal, 29 

V 

Validation, 131 


Validation of marker, 118 
Variance, 19 
comparison, 64—65 
components, 54 
predicted value, 97 
Verification, 130 

w 

Weighted grand mean, 50 
Welch-Satterthwaite, 61-62 
Welch test, 61 

WHO International Unit, 122 
Wilcoxon signed rank test, 60, 66 
Wilson method, SE proportion, 26 
Winsorized mean, 33 
Within series imprecision, 127 

Y 

Yates' continuity correction, 45 
Youden index, 106-107 
Youden plot, 57 

z 

z-score, 27, 96 



